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FOREWORD 


This book is the second in the Purdue Infor- 
mation Literacy Handbooks series. The book 
fulfills the purpose of the series, which is to 
promote evidence-based practice in teaching 
information literacy competencies through the 
lens of different academic disciplines. Informa- 
tion literacy implies the ability to find, manage, 
and use information in any format, and editors 
Carlson and Johnston apply it to the format 
of raw data. They coined the term data infor- 
mation literacy as an application of information 
literacy in the context of research. 

Since much data is accessible on the Web 
now and federal agencies are encouraging reuse 
of data, rather than re-creating data sets, librar- 
ians have embraced the opportunity to apply 
the organization and management principles of 
library and information science to data. 

Data Information Literacy: Librarians, Data, 
and the Education of a New Generation of Re- 
searchers is a timely work based on research 
funded by the Institute of Museum and Li- 
brary Services. Carlson and Johnston included 
librarians who worked with different scientific 


Vil 


disciplines in the Data Information Literacy 
(DIL) project to write for this publication. 
Through interviews, the voices of faculty and 
graduate students revealed the need for a more 
effective way to learn DIL competencies and 
integrate them into their practice. The DIL 
project revealed specific skill gaps that graduate 
students in the sciences and engineering have 
related to managing, publishing, and preserv- 
ing data sets for research. Librarians developed 
and assessed tailored educational strategies for 
addressing these gaps in five settings. 

Carlson and Johnston make a strong case for 
the role of librarians in teaching graduate stu- 
dents to manage, publish, and preserve data. 
They and the chapter authors give advice based 
on their experience for academic librarians to 
establish DIL programs at their institutions. 

This handbook will have value for librar- 
ians and library administrators in colleges 
and universities in which students participate 
in faculty research projects. With it, they can 
develop and implement plans to address an 
important, unmet educational need. Although 
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this book focuses on some of the science and 
engineering disciplines, those in the humani- 
ties and social sciences may be able to apply 
the methods used for identifying and address- 
ing educational issues in their areas. This book 


Sharon Weiner, EdD, MLS 
Series Editor 


will support library administrators who want 
their libraries to participate in the educational 
and research mission of their institutions. It 
will give practitioners guidance for developing 
such an effort. 


Professor and W. Wayne Booker Chair in Information Literacy, Purdue University Libraries 


Vice President, National Forum on Information Literacy 


August 2014 


PREFACE 


We did not set out to write a book on the sub- 
ject of data information literacy. Our initial 
intent was to explore the educational needs of 
graduate students in working with data and to 
report our findings to the research library com- 
munity. When we started our investigations in 
2010, there was a dawning recognition among 
academic librarians that the rising expectations 
for researchers to manage, document, organize, 
disseminate, and preserve their data in ways 
that would contribute to the advancement of 
their fields would require novel educational 
initiatives and programs. More importantly, we 
recognized that this was an area where librar- 
ians could potentially make important contri- 
butions. At the time, there were only a few ex- 
amples of educational programs that addressed 
issues relating to data management and cura- 
tion and very little practical guidance on what 
content should be taught. 

Our early investigation into articulating 
“data information literacy,” or DIL as we came 
to call it, was tremendously helpful for us in 
better understanding the needs of faculty and 


students in this space. However, as the needs 
surrounding educational programming on data 
issues became more apparent, the more ques- 
tions we had. Based on prior research by a Pur- 
due University team the 12 DIL competencies 
helped us to see possibilities for developing 
educational programming, but what would our 
programming actually include, what pedago- 
gies could be applied, and what would we as 
librarians be qualified to teach to researchers? 
In short, how could we apply the theoretical 
competencies for DIL in ways that would have 
a real-world impact on students? Thanks to the 
generous support of the Institute of Museum 
and Library Services, we had the opportunity 
to seek answers to these questions through de- 
veloping the Data Information Literacy project. 

This book contains descriptions of our work 
in carrying out the DIL project, but our goal 
in sharing our findings in this way goes far 
beyond simply reporting our experiences. We 
believe that DIL represents an opportunity to 
leverage the expertise, knowledge, and skill 
sets of librarians and apply them to an area of 
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growing need. Fulfilling this need represents a 
potentially significant advancement for librar- 
ians in engaging in both the teaching and re- 
search missions of the academy. To further this 
goal, we share our findings and our experiences 
from a practical approach, in ways that will en- 
able librarians and other information profes- 
sionals to build on our work and to incorporate 
what we have learned into their own DIL pro- 
grams as appropriate. It is our sincere hope that 
this book will serve not only as a resource to 
those who seek to develop DIL initiatives and 
programs at their institutions, but as a means 
to further a discussion on the direction of DIL 
and how it could take shape as a component of 
services offered by the library. 
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INTRODUCTION 


Jake Carlson, University of Michigan 
Lisa R. Johnston, University of Minnesota 


“The data management skills that students need are many and they dont 
necessarily have them and they dont necessarily acquire them in the time 


of the project.” 


— FACULTY MEMBER INTERVIEWED IN THE 


Data INFORMATION LITERACY PROJECT 


“Finally, Im finding that by taking this class and doing these readings 
Im becoming more aware of different data management services in my 


own field.” 


— GRADUATE STUDENT'S EVALUATION OF A 


Data INFORMATION LITERACY COURSE 


We developed the Data Information Literacy 
(DIL) project to answer two overarching ques- 
tions. First, what data management and cura- 
tion skills are needed by future scientists to 
fulfill their professional responsibilities and 
take advantage of collaborative research op- 
portunities in e-science and technology-driven 
research environments? Second, how can aca- 
demic librarians apply their expertise in infor- 
mation retrieval, organization, dissemination, 
and preservation to teaching these competen- 
cies to students? By answering these questions 
our goals were to build a foundation in the 


library community for teaching DIL compe- 
tencies, to teach students DIL competencies 
appropriate to their discipline, and to develop 
a robust process for librarians to develop DIL 
curricula and programming. We accomplished 
these goals through designing, constructing, 
implementing, and assessing programs to teach 
a selection of the DIL competencies to gradu- 
ate students to bolster productivity in their cur- 
rent work and foster success in their eventual 
careers. In many ways, we successfully accom- 
plished what we set out to do. Students and 
faculty who participated in our programs are 
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better able to identify and articulate their data 
needs (for example, in constructing a National 
Science Foundation [NSF] data management 
plan [DMP]), and are now better equipped to 
address these needs. However, there is much 
more work to be done. In addition to increas- 
ing our collective capacity to develop and offer 
effective DIL programs, we need to raise aware- 
ness of larger issues and enable participants in 
our programs to contribute to their disciplines’ 
efforts to address data management and cura- 
tion issues at a community level. It is our hope 
that this next important step will be facilitated 
by the experiences, examples, and informa- 
tive guide, included in this volume, so that 
academic librarians may continue this work at 
their own institutions. 


NEW ROLES FOR LIBRARIANS: DATA 
MANAGEMENT AND CURATION 


Computationally intensive research, also 
known as cyberinfrastructure or e-science, de- 
pends on ready access to high-quality, well- 
described data sets. However, the capacity to 
manage and curate research data has not kept 
pace with the ability to produce them (Hey & 
Hey, 2006). In recognition of this gap, the NSF 
and other funding agencies are now mandating 
that every grant proposal must include a DMP 
(NSE, 2010). These mandates highlight the 
benefits of producing well-described data that 
can be shared, understood, and reused by oth- 
ers, but they generally offer little in the way of 
guidance or instruction on how to address the 
inherent issues and challenges researchers face 
in complying. Even with increasing expecta- 
tions from funding agencies and research com- 
munities, such as the announcement by the 
White House for all federal funding agencies 


to better share research data (Holdren, 2013), 
the lack of data curation services tailored for 
the “small sciences,” the single investigators or 
small labs that typically comprise science prac- 
tice at universities, has been identified as a bar- 
rier in making research data more widely avail- 
able (Cragin, Palmer, Carlson, & Witt, 2010). 

Academic libraries, which support the re- 
search and teaching activities of their home 
institutions, are recognizing the need to de- 
velop services and resources in support of the 
evolving demands of the information age. The 
curation of research data is an area that librar- 
ians are well suited to address, and a num- 
ber of academic libraries are taking action to 
build capacity in this area (Soehner, Steeves, & 
Ward, 2010). 


AN UNMET NEED: EDUCATIONAL 
PROGRAMMING ON DATA 


The NSF's (2007) Cyberinfrastructure Vision for 
21st Century Discovery advocated that 


curricula must also be reinvented to exploit 
emerging cyberinfrastructure capabilities. 
The full engagement of students is vitally im- 
portant since they are in a special position to 
inspire future students with the excitement 
and understanding of cyberinfrastructure- 
enabled scientific inquiry and learning. On- 
going attention must be paid to the education 
of the professionals who will support, deploy, 
develop, and design current and emerging cy- 


berinfrastructure. (p. 38) 


Despite the articulated need for educa- 
tional initiatives focused on e-science, there 
has been little attention to ensuring that gradu- 
ate students learn the skills required for the 


management, organization, access, reuse, and 
preservation of research data as a component 
of their educational program. Several institu- 
tions, including Indiana University and Rens- 
selaer Polytechnic Institute, have introduced 
stand-alone courses to provide such an educa- 
tion (Indiana University Pervasive Technology 
Institute, 2010; TWC, n.d.). However, stu- 
dents may hesitate to enroll in courses listed 
outside of their discipline and may not gain a 
full understanding of the expectations, norms, 
and best practices of their discipline from such 
general courses. 

A few 
the University of North Carolina at Chapel 
Hill and the University of Illinois at Urbana- 


information schools, including 


Champaign, developed programs to teach con- 
cepts and issues in data curation (GSLIS, 2010, 
2011; Tibbo & Lee, 2010). These programs and 
workshops illuminate the potential roles of li- 
brarians in data curation and management and 
have done a lot to advance the field of librarian- 
ship. However, these courses are isolated from 
scientific activities and are generally intended 
to train not disciplinary specialists, but infor- 
mation professionals. Our approach in the DIL 
project has been to forge strong relationships 
with the disciplines through partnerships with 
science faculty and graduate students through 
in-depth interactions to develop a rich under- 
standing of their disciplinary and real-world 
needs. Thus, the main difference between the 
programming done by information schools and 
the DIL project is our focus on the frontline 
researcher and student, making sure that our 
content is relevant, useful to their work, and de- 
livered successfully. Data curation curricula at 
information schools center on production of in- 
formation while the Association of College and 
Research Libraries’ (ACRLs) 2000 information 
literacy standards focus on the consumption of 
information. But science research faculty and 


INTRODUCTION 3 


students need a curriculum that balances both 
perspectives and concentrates on specific, prac- 
tical skills needed for working with data. 


REIMAGINING AN EXISTING ROLE 
OF LIBRARIANS: TEACHING 
INFORMATION LITERACY SKILLS 


Many academic librarians have embraced their 
role as educators through information literacy 
programs at their institutions. Information lit- 
eracy centers on teaching students “the abil- 
ity to recognize when information is needed 
and have the ability to locate, evaluate and use 
effectively the needed information” (ACRL, 
2000, p. 2), with the ultimate goal of enabling 
lifelong learning. Ideally information literacy 
programs are targeted to the specific context 
of the intended audience, are in-depth in their 
coverage, and are integrated within courses 
and curricula. 

The DIL project was structured on a belief 
that there is great potential to match existing 
librarians’ expertise in information literacy 
with support for e-science. By combining the 
use-based standards of information literacy 
with skill development 
across the whole data Our approach in 
life cycle, we sought to the DIL project has 
support the practices been to forge strong 
of science by develop- relationships with the 
ing a DIL curriculum disciplines through 
and providing training partnerships with 
for higher education science faculty and 
students and research- graduate students 
ers. We increased ca- 


enabled 


comparative work by 


through in-depth 
pacity and interactions to develop 
a rich understanding of 
involving several insti- their disciplinary and 


tutions in developing real-world needs. 
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instruction in DIL. Finally, we grounded the 
instruction in the real-world needs as articu- 
lated by active researchers and their students 
from a variety of fields. 


THE FRAMEWORK FOR THIS BOOK 


This book is divided into three parts. Part I, 
“Making the Case for Data Information Lit- 
eracy, follows the history and evolution of this 
emerging field in academic librarianship and in 
the DIL project specifically. Part II, “Data In- 
formation Literacy Disciplinary Case Studies” 
describes five DIL disciplinary case studies that 
cover a range of student and faculty needs with 
distinct approaches to library-based education 
in DIL. Part III, “Moving Forward,” includes a 
robust guide for practicing librarians seeking to 
build DIL programs and an exploration of how 
DIL may develop in the future. 


Part I: Making the Case for 
Data Information Literacy 


We begin by looking closely at the research that 
led to the development of DIL as a concept. In 
Chapter 1, we reprint an article that first ar- 
ticulated the 12 DIL competencies (Carlson, 
Fosmire, Miller, & Sapp Nelson, 2011). The 
research behind the development of the 12 DIL 
competencies is explained, and a brief compari- 
son is performed between DIL and information 
literacy, as defined by the 2000 ACRL standards. 

Chapter 2 provides a description of the 
Institute of Museum and Library Services— 
funded DIL project, which ran from 2011 to 
2014, and applies the 12 DIL competencies 
in practice. This chapter includes our thinking 
and approaches toward engaging researchers 
and students with the 12 competencies, a re- 
view of the literature on a variety of educational 
approaches to teaching data management and 


curation to students, and an articulation of our 
key assumptions in forming the DIL project. 
Chapter 3 contains an in-depth analysis of 
each of the 12 DIL competencies from the 
perspective of our faculty partners in the DIL 
project and some of their graduate students. 
Here we compared and analyzed the qualitative 
aspects of the interviews we conducted to gain 
a better overall understanding of their needs. 
We compared the responses from faculty and 
graduate students for each of the competencies 
and discuss the differences between them. As 
with this introduction, portions of Chapters 2 
and 3 originally appeared in a 2013 issue of the 
International Journal of Digital Curation. 


Part Il: Data Information Literacy 
Disciplinary Case Studies 


This section of the book includes the DIL case 
studies that resulted from the work of the five 
faculty-librarian partnerships in the DIL proj- 
ect. The method of case studies was chosen 
to provide a disciplinary look at the needs of 
students and faculty in the DIL competencies. 
We selected case studies as our research ap- 
proach as they emphasize gathering individual 
perceptions through personal interactions for 
analysis (Blatter, 2008). Each of the five teams 
defined learning outcomes and developed 
pedagogies for teaching and evaluating their 
students’ learning on the basis of the particu- 
lar needs identified in the interviews. The five 
approaches explored DIL training in a variety 
of settings while remaining grounded in disci- 
plinary and local needs. In these case studies, 
each team detailed how they developed their 
DIL program, the educational interventions 
they employed, the results of the assessments 
they conducted, and their recommendations 
for future iterations of their program. 

Chapter 4 reports on the experiences of Cor- 
nell University in developing a 6-week, for-credit 


course for graduate students in the Department 
of Natural Resources. This case study involves 
a research lab that collects a variety of different 
data pertaining to fishing and water quality over 
a number of years, emphasizing the crucial need 
for data curation and maintenance over the ex- 
tended life span of the data. Because these lon- 
gitudinal data cannot be reproduced, acquiring 
the skills necessary to work with databases and 
to handle data entry was described as essential. 
Interventions took place in a classroom set- 
ting through a spring 2013 semester one-credit 
course entitled Managing Data to Facilitate 
Your Research taught by this DIL team. 
Chapter 5 presents how the Carlson and 
Sapp Nelson DIL team from Purdue University 
worked with an engineering service-learning 
center to develop an approach to teach students 
how to document software code and project 
work. This team formed a collaboration with 
the Engineering Projects in Community Service 
(EPICS) center that provided undergraduate 
students practical experience through applying 
their engineering skills to assist local community 
organizations. Many of the service projects in- 
volved developing and delivering software code 
as a component of the completed project. This 
chapter details the DIL team’s embedded librar- 
ian approach of working with the teaching as- 
sistants (TAs) to develop tools and resources to 
teach undergraduate students data management 
skills as a part of their EPICS experience. And 
it reveals significant concerns about students’ 
organization and documentation skills. Lack of 
organization and documentation presents a bar- 
rier to (a) successfully transferring code to new 
students who will continue its development, 
(b) delivering code and other project outputs 
to the community client, and (c) the center ad- 
ministration’s ability to understand and evalu- 
ate the impact on student learning. By integrat- 
ing themselves into existing structures to enable 
close collaborations, the team developed short 
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skill sessions to deliver instruction to team lead- 
ers, crafted a rubric for measuring the quality 
of documenting code and other data, served as 
critics in student design reviews, and attended 
student lab sessions to observe and consult on 
student work. 

Chapter 6 describes the work done by the 
Bracke and Fosmire DIL team at Purdue to 
teach metadata and other DIL competencies 
to graduate students in an agricultural and 
biological engineering lab through a series of 
workshops. An important aspect of the research 
process for the students is comparing observed 
data collected in the field to simulation data 
generated by an array of hydrologic models. 
Although the faculty researcher had created 
formal policies on data management practices 
for his lab, this case study demonstrated that 
students’ adherence to these guidelines was 
limited at best. Similar patterns arose in discus- 
sions concerning the quality of metadata. This 
case study addressed a situation in which stu- 
dents are at least somewhat aware of the need 
to manage their data; however, they did not ad- 
dress this need effectively in practice. This DIL 
team worked with the faculty to implement 
the lab policies in a more structured fashion. 
Their educational program centered on creat- 
ing a checklist to serve as a means of comparing 
individual practice against the recommended 
procedures and to promote a smooth transition 
of the data from student to faculty upon the 
student’s graduation. In support of propagat- 
ing the checklist, this DIL team offered three 
workshops addressing core skills in data man- 
agement, metadata and data continuity, and 
reuse. 

Chapter 7 describes the work from the 
University of Minnesota team to design and 
implement a hybrid course to teach DIL com- 
petencies to graduate students in civil engi- 
neering. Students collected various types of 
data—primarily from sensors placed on active 
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bridges—to study factors which may lead to 
bridges being classified as unsound. The fac- 
ulty researcher expressed concern over his stu- 
dents’ abilities to understand and track issues 
affecting the quality of the data, the transfer 
of data from their custody to the custody of 
the lab upon graduation, and the steps neces- 
sary to maintain the value and utility of the 
data over time. To respond to these needs, 
the DIL team developed an online e-learning 
course composed of seven modules with addi- 
tional readings and links. The course was self- 
paced, allowing students to complete it out- 
side of their formal course work and research 
activity, and included an in-person workshop 
session. After completing the course, student 
outcomes included a written DMP for creat- 
ing, documenting, sharing, and preserving 
their data. 

Chapter 8 focuses on the work of the Uni- 
versity of Oregon DIL team and how they 
made the most of a limited window of oppor- 
tunity for teaching crucial data management 
skills. The DIL team in this case study devel- 
oped a one-shot session to address the needs 
of graduate students who were wrapping up a 
grant-funded project. While the research team 
shared field equipment manuals and some 
standard operating procedures via their inter- 
nal project website, they did not have written 
data management guidelines. Their practices 
were promulgated through the experiences 
team members brought to the project, or, 
through team discussions and other informal 
methods. This DIL team assigned independent 
readings followed by a discussion-based_ in- 
struction session during a regularly scheduled 
research team meeting. The topics of the ses- 
sion included lab notebooks and note taking, 
data backup and storage, file management, 
data repositories, metadata, and links to tools 
and further information. 


Part Ill: Moving Forward 


The third portion of the book leverages the ex- 
periences, efforts, and findings of the DIL proj- 
ect toward advancing the capacity of librarians 
to design and implement their own programs 
and describe an agenda for further research and 
exploration in DIL. 

Chapter 9 provides a guide for developing 
DIL programs based on a distillation of the ex- 
periences of the five project teams. To develop 
this guide, each of the project teams read and 
critiqued the case study reports produced by 
the other project teams. These case studies col- 
lectively present patterns and commonalities 
across the five DIL programs which were used 
as the basis for the guide. 

Chapter 10 revisits our findings on the 12 
DIL competencies and suggests areas for fur- 
ther research in developing each of them. Sapp 
Nelson analyzed the eight faculty interviews 
conducted for the DIL project, with a par- 
ticular focus on the skills or components of a 
DIL competency that were identified by the 
researcher beyond the descriptions that we pre- 
sented to them. Her findings provide additional 
insight into faculty perspectives on educating 
graduate students about data management and 
curation issues. This is a reminder that our un- 
derstanding of DIL competencies is evolving. 

Finally, Chapter 11 examines the questions 
and areas of exploration for furthering the 
development of DIL as a role for librarians. 
Carlson draws from two sources of informa- 
tion in charting a course for the growth of DIL 
programs and communities of practice. The 
first is the revision of ACRL information lit- 
eracy standards. ACRL is signaling a need to 
move beyond the checklist-of-skills approach 
that characterized the application of the 2000 
standards (ACRL, 2012). There are indica- 


tions that the new framework will center on an 


understanding of the environment and context 
in which learning takes place, including the 
experiences of the students themselves, and in 
understanding information-related concepts 
that students must acquire before they can 
develop expertise in their field of study. Many 
of the ideas and approaches articulated in the 
framework drafts echo the key assumptions of 
the DIL project and inform new directions for 
developing DIL. 

The second source of information for chart- 
ing future directions in DIL was our Data In- 
formation Literacy Symposium. The DIL teams 
held a 2-day symposium in 2013 at Purdue 
University. The intent of the symposium was to 
explore roles for practicing librarians in teach- 
ing competencies in data management and cu- 
ration and to plant seeds of a community of 
practice on this topic. More than 80 librarians 
registered for this event, and we reached capac- 
ity within 2 days after opening registration. 
We disseminated our findings to attendees for 
their review, and this provoked a great deal of 
thoughtful discussion. Each of the DIL teams 
presented their work and shared their experi- 
ences through presentations, discussions, and 
hands-on exercises. The symposium concluded 
with an articulation of ideas for future direc- 
tions for further developing roles for librarians 
in delivering DIL programs. These articulations 
inform a community-driven map for future re- 
search and directions in DIL. Video and mate- 
rials from the DIL Symposium are available at 
http://docs.lib.purdue.edu/dilsymposium. 


CONCLUSION 


This book articulates an emerging area of oppor- 
tunity for librarians and other information pro- 
fessionals developing programs that introduce 
students in higher education to the knowledge 
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and skills needed to work with research data. By 
viewing information literacy and data services 
as synergistic activities, we seek to connect the 
progress made and the lessons learned in each 
service area in order to forge strong approaches 
and strategies. The intent of presenting this in- 
formation in one publication is to help librar- 
ians develop practical strategies and approaches 
for developing customized DIL programs using 
the work done in the DIL project as real-world 
case studies. We invite others to build from 
our experiences—both from these case studies 
and through the lens of current understand- 
ings of information literacy—to make recom- 
mendations for future directions and growth of 
DIL. More information about the DIL project 
can be found on the project’s website (http:// 
datainfolit.org). 


NOTE 


Portions of this chapter are reprinted from 
Carlson, J., Johnston, L., Westra, B., & Nich- 
ols, M. (2013). Developing an approach for 
data management education: A report from 
the Data Information Literacy project. Interna- 
tional Journal of Digital Curation, 8(1), 204- 
217. http://dx.doi.org/10.2218/ijdc.v8il.254 
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INTRODUCTION 


The nature and practice of research and schol- 
arship is undergoing dramatic change with the 
advent of ready access to high-bandwidth net- 
works, the capacity to store massive amounts 
of data, and a robust and growing suite of ad- 
vanced informational and computational data 
analysis and visualization tools. ‘The practice of 
technology-driven research, known as e-science, 
or more broadly as e-research, has had a trans- 
formative effect in the science and engineer- 
ing fields. E-research applications are growing 
within the humanities and social science dis- 
ciplines as well, where e-research is poised to 
have similar effects on the nature and practice 
of research. 

The complexity and scale of e-research in 
turn requires an evolution of traditional mod- 
els of scholarly communication, library ser- 
vices, and the role of librarians themselves. In 
response, librarians are initiating discussions 
and projects to situate themselves in those ar- 
eas of e-research most in need of library sci- 
ence expertise (Jones, Lougee, Rambo, & Ce- 
leste, 2008). In light of the federal expectation 
that grant proposals have a data management 
plan (DMP; NSF, 2011), libraries are starting 
conversations in their universities to negotiate 
a role in the management of research outputs. 

Data management skills also provide the 
opportunity for an evolution of instruction in 
libraries. Academic libraries offer information 
literacy courses and programs as part of the 
educational mission of the institution. Extend- 
ing information literacy to include programs 
on data management and curation provides a 
logical entry point into increasing the role of 
libraries in supporting e-research. A successful 
education program, however, must be based 
on a firm understanding of current practice 
and standards as well as the needs of the target 


audience. There is a lack of research on the 
needs of both the researchers and the students 
grappling with these issues in the classroom 
and in the laboratory. The authors attempted 
to address this knowledge gap by gathering data 
from interviews with faculty researchers and 
from the authors’ own Geoinformatics course. 
With this information, the authors proposed 
a model set of outcomes for data information 
literacy (DIL). 


BACKGROUND 


E-Research and Implications for Libraries 


E-research has had a tremendous impact on a 
number of fields, increasing the capabilities of 
researchers to ask new questions and reduce 
the barriers of time and geography to form new 
collaborations. In astronomy for example, the 
National Virtual Observatory (NVO) makes it 
possible for anyone from professional astrono- 
mers to the general public to find, retrieve, 
and analyze vast quantities of data collected 
from telescopes all over the world (Gray, Sza- 
lay, Thakar, Stoughton, & vandenBerg, 2002; 
National Virtual Observatory, 2010). For 
scholars of literature, the HathiTrust Digital 
Library not only provides a tremendous collec- 
tion of scanned and digitized texts, but also its 
Research Center provides tools and computa- 
tional access to scholars seeking to apply data 
mining, visualization, and other techniques to- 
ward the discovery of new patterns and insights 
(HathiTrust Research Center, n.d.). It should 
be no surprise, of course, that such projects 
simultaneously produce and feed upon large 
amounts of data. The capture, dissemination, 
stewardship, and preservation of digital data 
are critical issues in the development and sus- 
tainability of e-research. 


Funding organizations and professional soci- 
eties identified a need for educational initiatives 
to support a workforce capable of e-research 
initiatives. The National Science Foundation 
(NSF) first described the connection between 
e-research and education. The 2003 Atkins 
Report highlighted the need for coordinated, 
large-scale investments in several areas, includ- 
ing developing skilled personnel and facilities 
to provide operational support and services 
(Atkins et al., 2003). In 2005 the National Sci- 
ence Board produced a report that articulated 
existing and needed roles and responsibilities 
required for stewarding data collections, fol- 
lowed by a series of recommendations for tech- 
nical, financial, and policy strategies to guide 
the continued development and use of data col- 
lections (National Science Board, 2005). The 
American Council of Learned Societies issued 
a report in 2006 calling for similar attention 
and investments in developing infrastructure 
and services for e-research in the humanities 
fields (Welshons, 2006). More recently, the 
National Academy of Sciences issued a report 
advocating the stewardship of research data in 
ways that ensured research integrity and data 
accessibility. The recommendations issued in 
the report included the creation of systems for 
the documentation and peer review of data, 
data management training for all researchers, 
and the development of standards and policies 
regarding the dissemination and management 
of data (National Research Council, 2009). 

While the rich, collaborative, and challeng- 
ing paradigm of e-research promises to produce 
important, even priceless, cultural and scientific 
data, librarians are determining their role in 
the curation, preservation, and dissemination 
of these assets. In examining how e-research 
may affect libraries, Hey and Hey argued that 
e-research “is intended to empower scientists to 
do their research in faster, better and different 
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ways,” (Hey & Hey, 2006, para. 10). They 
particularly emphasized that information and 
social technologies made e-research a more 
communal and participatory exercise, one that 
will see scientists, information technology (IT) 
staff, and librarians working more closely to- 
gether. A particular challenge looming with 
the rise of e-research is the “data deluge” —that 
is, the need to store, describe, organize, track, 
preserve, and interoperate data generated by a 
multitude of researchers to make the data ac- 
cessible and usable by others for the long term. 
The sheer quantity of data being generated 
and our current lack of tools, infrastructure, 
standardized processes, shared workflows, and 
personnel who are skilled in managing and cu- 
rating these data pose a real threat to the con- 
tinued development of e-research. 

Gold (2007) provided an outline of the issues 
and opportunities for librarians in e-science. 
Starting from the familiar ground of GIS (geo- 
graphic information systems), bioinformatics, 
and social science data, Gold argued that librar- 
ians working in e-science will develop relation- 
ships—both upstream and downstream of data 
generation—and the effort may be “both re- 
vitalizing and transformative for librarianship” 
(Sec. 2.2, para. 6). Similarly, the Agenda for De- 
veloping E-Science in Research Libraries outlined 
five main outcomes that focused on capacity 
building and service development in libraries 
for supporting e-science (Lougee et al., 2007). 
Walters (2009) further asserted that libraries 
taking “entrepreneurial steps” toward becom- 
ing data curation centers are on the right track, 
reasoning that “a profound role for the univer- 
sity research library in research data curation 
is possible. If the role is not developed, then 
a significant opportunity and responsibility to 
care for unique research information is being 
lost” (p. 85). In other words, the academic li- 
brary community seems reasonably sure that 
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supporting e-research is not so novel that it falls 
outside of the mission and founding principles 
under which libraries operate. 


Educational Preparation for E-Research 


Ogburn (2010) predicted that e-science 
will quite certainly fail if future generations 
of scholars are not savvy with both the con- 
sumption and production of data and tools. 
“To prepare the next generation of scholars 
the knowledge and skills for managing data 
should become part of an education process 
that includes opportunities for students to 
contribute to the creation and the preserva- 
tion of research in their fields” (p. 244). It is 
not enough to teach students about handling 
incoming data, they must also know, and 
practice, how to develop and manage their 
own data with an eye toward the next scientist 
down the line. The Association of Research 
Libraries reported to the NSF in 2006 that 
because 


many scientists continue to use traditional 
approaches to data, i.e., developing custom 
datasets for their own use with little atten- 
tion to long-term reuse, dissemination, and 
curation, a change of behavior is in order. 
... [This change] will require a range of ef- 
forts, including . . . perhaps most important 
of all, concerted efforts to educate current 


and future scientists to adopt better practices. 
(Friedlander & Adler, 2006, p. 122) 


The inspiration for the authors’ own work 
on instructional components to e-science 
comes from the NSF's Cyberinfrastructure Vi- 
sion for 21st Century Discovery, in which the 
dramatic rhetoric of revolution and recreation 
does indeed trickle down to education: 


Curricula must also be reinvented to exploit 
emerging cyberinfrastructure capabilities. The 
full engagement of students is vitally impor- 
tant since they are in a special position to in- 
spire future students with the excitement and 
understanding of cyberinfrastructure-enabled 
scientific inquiry and learning. Ongoing at- 
tention must be paid to the education of the 
professionals who will support, deploy, de- 
velop, and design current and emerging cyber- 
infrastructure. (National Science Foundation 


Cyberinfrastructure Council, 2007, p. 38) 


Although many articulated the need for 
educating a workforce that understands the 
importance of managing and curating data in 
ways that support broad dissemination, use 
by others, and preservation beyond the life 
of its original research project, there has been 
very little examination of what such a pro- 
gram would contain. We believe that librar- 
ians have a role in developing these education 
programs and will need to actively engage in 
these discussions. 

Gabridge (2009) notes that institutions ex- 
perience 


a constantly revolving community of students 
who arrive with .. . uneven skills in data man- 
agement... . Librarian subject liaisons already 
teach students how to be self-sufficient, inde- 
pendent information consumers. This role 
can be easily extended to include instruction 


on data management and planning. (p. 17) 


With the respectful elision of “easily,” we ar- 
gue in the remainder of this chapter that there 
are indeed gaps in the knowledge of current e- 
researching faculty and students (both as pro- 
ducers and consumers of data) that librarians 
may address by developing DIL curricula. 


Environmental Scan of 
Related Literacies 


For the sake of clarity, it is important to dis- 
tinguish DIL from other literacies such as 
data literacy, statistical literacy, and informa- 
tion literacy. Typically, data literacy involves 
understanding what data mean, including 
how to read graphs and charts appropriately, 
draw correct conclusions from data, and rec- 
ognize when data are being used in misleading 
or inappropriate ways (Hunt, 2004). Statisti- 
cal literacy is “the ability to read and interpret 
summary statistics in the everyday media: in 
graphs, tables, statements, surveys and stud- 
ies,” (Schield, 2010, p. 135). Schield finds 
common ground in data, statistical, and in- 
formation literacy, stating that information 
literate students must be able to “think criti- 
cally about concepts, claims, and arguments: 
to read, interpret and evaluate information.” 
Furthermore, statistically literate students 
must be able to “think critically about basic 
descriptive statistics, analyzing, interpreting 
and evaluating statistics as evidence.” Data lit- 
erate students must “be able to access, assess, 
manipulate, summarize, and present data.” In 
this way, Schield (2004, p. 8) creates a hierar- 
chy of critical thinking skills: data literacy is 
a requisite for statistical literacy, and, in turn, 
statistical literacy is required for information 
literacy. Stephenson and Caravello (2007) ex- 
tol the importance of data and statistical litera- 
cies as components of information literacy in 
the social sciences, arguing that the ability to 
evaluate information essentially requires that 
one understand the data and statistics used in 
an information resource. 

Qin and D’Ignazio (2010) developed a 
model, Science Data Literacy, to address the 
production aspect of data management. SDL 
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refers to “the ability to understand, use, and 
manage science data” (p. 2) and an SDL edu- 
cation 


serves two different, though related, pur- 
poses: one is for students to become e-science 
data literate so that they can be effective sci- 
ence workers, and the other is for students 
to become e-science data management pro- 
fessionals. Although there are similarities in 
information literacy and digital literacy, sci- 
ence data literacy specifically focuses less on 
literature-based attributes and more on func- 
tional ability in data collection, processing, 


management, evaluation, and use. (p. 3) 


Whereas definitions of data, statistical, and 
information literacy focus on the consumption 
and analysis of information, the production 
of information is often overlooked in literacy 
instruction. E-research is, by definition, a so- 
cial process, and contributing to—not just ex- 
tracting from—the communitys knowledge 
base is crucial. DIL, then, merges the concepts 
of researcher-as-producer and researcher-as- 
consumer of data products. It builds upon and 
reintegrates statistical, information, and sci- 
ence data literacy into an emerging skill set. 


Prior Instructional Efforts in 
Data Information Literacy 


Several libraries have developed programs or 
prototypes to address those needs. The Mas- 
sachusetts Institute of Technology Libraries 
created a robust “Manage Your Data” subject 
guide/tutorial, supplemented by seminars such 
as Managing Research Data 101 (Graham, 
McNeill, & Stout, 2011). Both resources in- 
clude data planning checklists that include the 
following topics: 
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e Documentation and metadata 

e Security and backups 

e Directory structures and naming conven- 
tions 

e Data sharing and citation 

e Data integration 

e Good file formats for long-term access 

e Best practices for data retention and ar- 
chiving 


The University of Virginia Library created 
the Scholars’ Lab and Research Computing 
Lab. These projects, collaborative ventures 
between IT and library departments, created 
a new service model that included traditional 
roles for IT (software support and training) and 
librarians (subject knowledge and departmen- 
tal interactions), as well as services that bridged 
those disciplines such as data management and 
analysis, computational software support, and 
knowledge of emerging technologies. Librar- 
ians from the University of Virginia explained: 
“We chose to promote the service areas of 
software support, current awareness, data, col- 
laboration, and research communication. . . . 
Collectively, we view these as being supportive 
pieces to the entire research lifecycle, rather 
than just a single point” (Hunter, Lake, Lee, 
& Sallans, 2010, p. 341). While the University 
of Virginia model focused primarily on refer- 
ence and project-based services, the Scholars’ 
Lab also provided workshops and seminars on 
special topics in data management such as GIS, 
Web application development, and text digiti- 
zation. 

The Science Data Literacy project at Syra- 
cuse University developed a program “to train 
students with the knowledge and skills in col- 
lecting, processing, managing, evaluating, 
and using data for scientific inquiry” (Qin & 
D’Ignazio, 2010, p. 2). As part of the project, 
Qin developed a credit-bearing course, Science 


Data Management, covering the fundamentals 
of scientific data and its description, manipula- 
tion, visualization, and curation. Project SDL 
made its syllabus for the course, with lecture 
notes, available online (Science Data Literacy 
Project, 2010). 

The Purdue University Libraries are active 
in this area as well. Two of the authors of this 
chapter developed a Geoinformatics course 
with a faculty member in the Department of 
Earth, Atmospheric, and Planetary Sciences 
(Miller & Fosmire, 2008). The instructors de- 
signed Geoinformatics for beginning graduate 
and advanced undergraduate students. The 
course provided a holistic approach to GIS 
and spatial data, encompassing the full cycle 
of data, from discovery and acquisition to con- 
version and manipulation, analysis, and finally 
visualization, metadata, and re-sharing. The 
syllabi are online (Miller, 2010). 


ASSESSMENTS OF FACULTY 
AND STUDENT NEEDS IN DATA 
INFORMATION LITERACY 


Like e-research, DIL is not new, but rather 
compiles expertise and portions of existing 
research methods, information and other lit- 
eracies, and computing curricula to offer more 
holistic, communal, and participatory perspec- 
tives and techniques for e-researchers. Just as e- 
research encourages researchers from a variety 
of disciplines to collaborate to advance scien- 
tific knowledge, disciplinary and library faculty 
must work together to determine the skill sets 
that a data literate student should demonstrate 
and to develop best practices for imparting 
those skills to the students. Both faculty mem- 
bers and students have perspectives on the 
necessary data management skill sets in their 


fields. Grounded in these perspectives are their 
real-world perceptions and practices and a first- 
hand knowledge of how one conducts research 
in his or her respective discipline. Any attempt 
to define a DIL program must be aligned with 
current disciplinary practices and cultures if 
it is to be relevant to and accepted by its in- 
tended audience(s). The authors compiled the 
perspectives of both faculty and students from 
two different research projects, one based on 
interviews with faculty members and the other 
on surveys of students and an analysis of their 
course work. In the next two sections, the au- 
thors report on the DIL priorities articulated 
by both faculty and students as discovered 
through our assessments. 


Assessment of Faculty Needs: 
A Reexamination of the Data 
Curation Profiles Project 


In the fall of 2007, the Purdue University Li- 
braries and the Graduate School of Library and 
Information Science at the University of Illinois 
at Urbana-Champaign (UIUC) received fund- 
ing from the Institute of Museum and Library 
Services (IMLS) to carry out the Data Curation 
Profiles (DCP) project. The goals of the DCP 
project were to better understand the willing- 
ness of research faculty to share their data with 
others— including the conditions necessary for 
data sharing to take place—and to investigate 
possible roles for librarians in facilitating data 
sharing and curation activities. 

‘The investigators interviewed participating 
faculty at Purdue and UIUC, focusing on three 
broad areas: the nature and life cycle of one 
of the data sets generated by researchers; their 
data management practices; and their needs for 
making their data available to others and curat- 
ing their data for long-term access. These inter- 
views resulted in the creation of “data curation 
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profiles,” each of which summarized the infor- 
mation gathered from the interview under a 
common framework that enabled comparisons 
to be made among the researchers’ responses 
(Witt, Carlson, Brandt, and Cragin, 2009). 
The first round of interviews for the DCP 
project took place at Purdue and UIUC in the 
summer and early fall of 2008. A convenience 
sample of faculty participants was recruited 
from a broad selec- 
tion of departments 


The DIL project was 


in the sciences and , : 
predicated in part by 


ngineerin n th ; 
engineering oO the the Data Curation 


basis of pror rela- Profiles project, which 


tionship s with project explored the willingness 


rsonnel or liaison 
PEE O fetes of research faculty to 


librarians. The semi- 
share their data with 


structured interviews 


asked broad, 


others—including the 


open- 
ended questions to 
allow participants to 


conditions necessary 
for data sharing to 
take place—and to 


control the direction hres , 
investigate possible 


of the discussion and aie, Po 
roles for librarians in 


identify the most im- facilitating data sharing 


portant issues related and curation activities. 
to sharing and curat- 
ing their data. The 
investigators then extracted common themes 
from the transcripts using grounded theory. 
One of the common themes emerging from 
the interviews concerned the skills, knowledge, 
and training needed by graduate students to 
effectively manage and curate research data. 
Graduate students actively generated and cu- 
rated data in support of their own research. 
Many also oversaw the management of data 
generated by the entire research group. A few 
of the faculty noted that their graduate students 
had been asked to share their data with individ- 
uals not affiliated with the research and there- 
fore had to consider similar issues of whether 
or not to share and what conditions to place 
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on sharing. Typically, faculty reported that 
graduate students were unprepared to manage 
or curate the data effectively. While acknowl- 
edging that this was an area of concern, they 
often could not provide adequate guidance or 
instruction because it was not an area that they 
knew well or fully understood. 
The investigators conducted a second round 
of interviews in the spring of 2009 to gather 
additional details from 
The overwhelming 
majority of 


researchers in this 


faculty and address gaps 
from the first interview. 
Investigators asked the 
study felt that their faculty participants at 
Purdue whether there 


was a need for a data 


students needed 

some form of DIL 
education. management and cura- 
tion training program 
for graduate students, and what such an edu- 
cational program should contain. Responses 
from these second interviews were coded and 
analyzed with the information from the first 
interviews. A total of 19 faculty from both 


schools completed both interviews. 
Faculty Assessment: Results 


Generally, faculty in this study expected their 
graduate students to carry out data manage- 
ment and handling activities. However, the 
extent of data management responsibilities 
varied among the faculty interviewed. Some 
took an active, hands-on role in managing 
their data with minimal student involvement, 
while others delegated most data management 
tasks to their students. Typical responsibilities 
of graduate students included processing or 
cleaning the data to enable use or analysis, as- 
suring the quality of the data, compiling data 
from different sources, and organizing the 
data for access and use by project personnel. 


In addition, faculty often considered data 
management duties as distinct from other re- 
search responsibilities. 

Analysis of the interviews revealed that the 
training graduate students received and the 
training methods varied widely. Some of the 
researchers taught their graduate students data 
management tasks, such as how to develop 
and assign metadata to the data files. Other re- 
searchers reported that their graduate students 
had not received much, if any, formal training 
in data management and were left to figure 
things out on their own. 

Given the variance in the range of respon- 
sibilities and training in data management 
received by graduate students, it is not sur- 
prising that faculty presented a mixed picture 
in assessing the work of their students in this 
area. Several faculty expressed frustration with 
their inability to understand or make use of the 
data their students had been working on, es- 
pecially after they graduated. Other comments 
provided a positive statement of individual 
students’ skills, which they generally acquired 
without formal training. 

The overwhelming majority of researchers 
in this study felt that their students needed 
some form of DIL education. However, even 
in stating a need for such a program, several re- 
spondents expressed an uncertainty or a reluc- 
tance to teach data management skills to their 
students themselves. Some faculty expressed 
a concern about getting too involved in tell- 
ing students what to do in what should be the 
students’ own work, or in making their work 
more difficult by introducing new software or 
formats to work with. Furthermore, although 
faculty identified the lack of data management 
skills in their graduate students as a strong con- 
cern and described broad themes that should 
be addressed, they often could not articulate 


precisely what skills should be taught to rem- 
edy the situation. 


Interviewer: Is there a need for education 


in data Management or curation for graduate 


students? 
Faculty: Absolutely, God yes . . . I mean 
were ... We have the ability to accumulate 


huge datasets now[,] especially with the new 
tools that we have. 

Interviewer: So, what would that educa- 
tion program look like, what would it consist 
of? What kind of things would be taught? 

Faculty: Um, I would say, um, and I don’t 
really know actually, just how do you manage 
data? I mean, where do you put it? Um, how 
secret does it need to be? Or you know, confi- 
dentiality things, ethics, probably um . . . Pm 
just throwing things out because I hadn't re- 


ally thought that out very well. (Soil Scientist) 


After coding and analysis, several major 
themes emerged from the faculty's observations 
of graduate students’ deficiencies in data man- 
agement. These themes are metadata, standard- 
izing documentation processes, maintaining 
relationships among data, ethics, quality assur- 
ance, basic database skills, and preservation. 


Metadata 

An understanding of metadata and how to 
apply it were frequently mentioned as areas 
of need, although the term metadata was not 
used often. More often, researchers said their 
students needed to know how to annotate 
and describe data. In most cases, references 
to “annotations” included both a need to pro- 
vide information about a data file as well as 
information about individual components of 
the data (such as a cell in a spreadsheet). The 
main reasons for providing metadata include 
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assuring that data can be understood by oth- 
ers (both within the lab and by external audi- 
ences), enabling its continued usability over 
time, and fostering use of the data beyond its 
original purpose. 

Researchers also expressed the need to ap- 
ply and conform to metadata standards. One 
researcher stated that not only must students 
be taught “how to approach the idea of meta- 
data,” but also they must develop an awareness 
of standardized disciplinary ontologies and 
how to apply them to their own work. 


Standardizing Documentation Processes 
Standardizing documentation processes is a 
rather broad theme that applies to both high- 
level organization as well as to specific, local 
needs. Researchers frequently reported a need 
for students to be able to organize data by doc- 
umenting it in a systematic and logical fashion. 
Explanations given for 


the need for rich docu- 
mentation often ex- 
tended beyond the im- 
mediate needs of the 
researcher's lab and in- 
cluded such high-level 
needs as enabling the 
sharing of data outside 
the research team, sub- 
mission to repositories, 
reuse by external audi- 
ences, and preservation 
beyond the research 
life cycle. At the local 
level, this category ad- 
dresses folder and file 


Several major themes 
emerged from the 
faculty’s observations 
of graduate students’ 
deficiencies in 

data management: 
metadata, 
standardizing 
documentation 
processes, maintaining 
relationships among 
data, ethics, quality 
assurance, basic 
database skills, and 


preservation. 


naming conventions, data sharing among the 
lab/project team(s), and assigning staff respon- 
sibilities for managing data, communication, 
and workflow. 
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Researchers expected their graduate students 
to share responsibility for documenting the lab 
or project’s data, as well as the student’s own 
interactions with it. Documenting data focuses 
on what needs to be recorded and provided 
while generating, processing, analyzing, and/or 
publishing the data to later validate and verify 
it. This includes such tasks as generating and 
maintaining data dictionaries, glossaries, or 
definitions of variables; maintaining lab note- 
books or their equivalent; and capturing the 
provenance of the data. Overall, researchers ex- 
pressed that students’ documentation needs to 
stand the test of time. 

Researchers in this study acknowledged the 
problem of data documentation, not only for 
their students but for themselves as well. Dif- 
ficulties in documenting data contributed to a 
larger concern: the lack of standardization and 
consistency in how the data are organized. Fac- 
ulty repeatedly mentioned that every student 
employs different methods of documenting his 
or her data. The lack of standardized and shared 
data management protocols and practices across 
a research group often led to a “tower of Ba- 
bel” situation, where it is difficult to understand 
what was done, by whom, and for what reason. 
This further led to difficulties in correlating 
and relating one data file with another or with 
the data collection as a whole. The inevitable 
turnover of students exacerbated this problem. 
Although most of the researchers in this study 
required their students to document their work 
with the data, actual documentation practices 
followed by the students varied from one to 
the next. Moreover, they often did not provide 
complete or detailed enough documentation to 
enable others to understand their work. 

Several researchers suggested creating a stan- 
dard operating procedure for data formatting 
and management. One faculty member noted 
that he created standard operating procedures 


for most equipment and procedures in the lab 
and proposed that a similar standard operating 
procedure be developed for handling and man- 
aging his data. When asked to describe an ideal 
situation for organizing data, several of the 
faculty members noted the need for students 
to develop and use a standardized set of best 
practices. 


Maintaining Relationships Among Data: 
Master Files and Versioning 

Many interviewees described the challenge of 
relating data files to each other. This includes 
issues related to taking data generated at a par- 
ticular time or for a particular purpose and en- 
abling its integration with other data to create 
a new data set. This category also includes the 
converse action, generating a subset of the data 
from a larger data set or file. 

Several researchers specifically mentioned 
the need for the creation of an official record of 
the data (a “master file”) to ensure the author- 
ity and integrity of this record compared to the 
working copies of data sets or files created and 
used for specific purposes by subsets of lab or 
project personnel. 

Many researchers desired that the master 
file bring a number of disparate files together 
into a searchable database that engenders ques- 
tion development and helps assure quality con- 
trol for research. A lack of standardization in 
data management practices, a high learning 
curve, and a perceived lack of support for the 
advanced database utilities and programs re- 
quired to create such files hindered the ability 
of researchers to achieve these goals. 

Researchers expressed the need to balance 
the requirements for a particular research proj- 
ect with those for making the data accessible 
and useful to the larger research community. 

This focus on the specific research needs of 
the student (or the faculty sponsor in some 


cases) often led to situations in which the fac- 
ulty member could not retrace the steps taken 
in processing the data and relate the graduate 
student’s work back to the larger data set to 
which it belonged. 

Akin to these issues of compiling or merg- 
ing data, researchers frequently brought up 
versioning as an often neglected but very im- 
portant concept for students to learn. In this 
study, researchers clearly reported the impor- 
tance of maintaining documentation of dif- 
ferent versions of their data. They wanted to 
know which data files were used for what anal- 
ysis, what file contained the current version be- 
ing used by the research group, and how these 
versions differed from each other. However, 
several faculty members admitted that they 
themselves had a difficult time in maintaining 
adequate documentation and struggled to con- 
sistently generate the needed documentation 
in a timely manner. 


Ethics 

Faculty members in this study identified “data 
ethics” as another area where most students 
need assistance. Data ethics includes intellec- 
tual property rights and ownership of data, 
issues of confidentiality/privacy and human 
subjects, implications and obligations of shar- 
ing (or not sharing) data with others (includ- 
ing open access), and assigning attribution and 
gaining recognition of one’s work. Although 
faculty clearly stated ethics as a needed area 
of instruction, they generally did not provide 
much description as to what the curriculum of 
such an ethics program would include. In one 
case, the professor tied ethics to an understand- 
ing of ownership of data. 


Basic Database Skills 


Several researchers expressed the expectation 
that students be able to understand and develop 
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relational databases and use database tools ef- 
fectively. Frequently, students’ lack of basic 
understanding of database development and 
usage frustrated the interviewees. However, the 
expectations of student skills differed among 
the researchers. A civil engineering professor 
acknowledged that students needed some basic 
understanding of relational databases, normal- 
ization of data, database tools, and documenta- 
tion techniques. 


Quality Assurance 

Researchers expected their graduate students 
to review or check their data and evaluate its 
quality. Interviewees mentioned the difficulty 
of knowing exactly what their students had 
done to compile and analyze the data. Thus 
the provenance of the data was unknown. One 
professor stated that she could not understand 
the work done by her students. 

Quality assurance is in some ways a blend 
of technical skills (familiarity with equipment), 
disciplinary knowledge (whether the result is 
even theoretically possible), and a metacogni- 
tive process that requires synthesis on the part 
of the students. Pri- 
marily, quality assur- Faculty repeatedly 
ance is the ability to mentioned that every 
recognize a pattern student employs different 


methods of documenting 
his or her data. 


or consistency in the 
data. Quality assur- 
ance may also facili- 
tate or impede the quality of documentation 
(annotation/metadata) produced, and the or- 
ganizational schema, of a given data set. 


Preservation 

Researchers expect their students to know how 
to preserve their data and document the pro- 
cessing of the data. Much like the discussion 
of metadata, faculty members generally under- 
stood the term preservation in a broad and loose 
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sense of the word, often conflating it with the 
simple backing up of files. They were unaware 
of or unacculturated to preservation from a li- 
brary perspective, instead focusing much more 
on the immediate issues and procedures sur- 
rounding backing up their data. 

Although researchers recognized the need for 
backups, the methods and timing of performing 
backups differed considerably among research 
groups. Some, having learned the hard way 
through lab disasters, kept geographically dis- 
persed backups. Others relied largely on gradu- 
ate students to create backups on departmental 
servers. Still others had no real-time backup 
system in place. A common problem expressed 
with backups was tracking versioning. 


Faculty Assessment: Lessons Learned 


The design of any DIL program requires an un- 
derstanding of the real-world needs of research 
groups, where research either progresses or is 
impeded by their ability to handle data in the 
ways described here. The faculty supervisors 
are no doubt acutely aware of the deficiencies 
in their students’ abilities to properly care for 
their research input and output. The interviews 
analyzed for this study provide a window into 
the ground-level interaction with data and in 
fact become a magnifying glass through which 
we can spot the deficiencies and gaps in knowl- 
edge that a DIL curriculum might target. 

We would be re- 
Although faculty clearly miss, however, not to 
stated ethics as a needed account for the gaps 
area of instruction, in faculty responses 
they generally did on data practices, as 
not provide much these interviews also 
description as to what expose faculty interac- 
the curriculum of such tion with data. Many 
an ethics program faculty admitted or 


would include. otherwise revealed that 


they themselves lacked expertise or experience 
with data management, even as they critiqued 
their students’ abilities. We must assume their 
critiques of their students’ (and their own) facil- 
ity with any or all aspects of data management 
may be somewhat shallow. In other words, they 
may not know what they don’t know about data 
management and curation. Therefore, a pro- 
gram based entirely on faculty self-report risks 
incompleteness, and other viewpoints on what 
should constitute the objectives for DIL must be 
taken into account. 

As a complement, then, the next section 
will draw conclusions that help to complete 
our DIL core objectives from a direct source, 
a course taught at Purdue University that 
broached some of these exact topics, including 
data source evaluation, metadata, databases, 
preservation, and sharing. 

This course allowed us to examine the DIL 
of students directly and learn from firsthand 
observation. Because we gained insight into 
what the students do not know, our own evalu- 
ation of student performance in a (classroom- 
simulated) research environment can serve as 
an important second front in developing a 
richer and more comprehensive list of core DIL 
objectives. 


ASSESSMENT OF STUDENT NEEDS 


Enrollees in the 2008 and 2010 offerings of the 
course Geoinformatics provided the sample 
population for our student assessment. The 
combined number of students enrolled totaled 
27: 12 in 2008 and 15 in 2010. Most of these 
were students majoring in earth, atmospheric, 
and planetary sciences, but other majors rep- 
resented in this course included civil engineer- 
ing, agricultural and biological engineering, 
and forestry and natural resources. In 2008, the 


core course content revolved around a “who- 
dunit” concept. Students were asked to track 
down, over the course of several laboratory 
exercises, the location of a fictitious chemical 
spill by gathering data (both spill data and un- 
derlying geology) and using various geospatial 
analysis and visualization techniques. Student 
projects provided the rest of the context for 
learning DIL skills. The 2010 course dropped 
the “whodunit” mechanism to shift more at- 
tention toward a longer, more involved semes- 
ter project. 

To improve and tailor the course, the authors 
used several methods to probe students’ inter- 
ests, their perceived needs, and their abilities to 
carry out data management tasks. Among these 
were a pre-course assessment to inventory the 
students’ technology and information skills and 
a post-course survey to determine their percep- 
tions of how important different topics were to 
their research. The instructors also analyzed stu- 
dent semester projects to determine how well 
they demonstrated mastery of DIL skills. 

We administered the pre-course survey in 
both offerings of Geoinformatics. It contained 
short-answer questions, mainly probing the 
students’ background in databases, GIS, and 
programming, such as “What computer pro- 
gramming languages do you know (for exam- 
ple, Fortran, C)?” and “What geospatial soft- 
ware do you use?” The instructors then tailored 
the course content to address the ability levels 
of the students. The post-course survey was 
given only to students in 2008. For each course 
topic, students rated, on a 5-point Likert scale, 
the lectures, the lab, and the importance of the 
topic to the course and to their own research. 
They also recommended improvements to the 
course labs. 

These instruments probed students’ attitudes 
toward various topics related to DIL. How- 
ever, there were disconnects between student 
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perceptions and their performance. As Grimes 
and Boening (2001), among others, have ob- 
served, novices tend to overstate their expertise, 
in large part because they don't know what they 
don't know. To provide a check of the degree 
to which the students actually demonstrated 
DIL skills, the instructors analyzed the stu- 
dents’ projects. The project required students 
to identify a problem or research question with 
geospatial components and use the skills and 
techniques discussed in class to advance that 
research and present the results of their work. 
It required both the acquisition of original 
data and the use of external, “published” data. 
And it involved analysis and visualization and 
required a summary of how the research an- 
swered or at least clarified the question or prob- 
lem. It should be noted that this course did not 
teach research methods or disciplinary content 
knowledge: the students needed to get content 
assistance from their own research group. 


Student Assessment: Results 


Although in both course offerings several stu- 
dents indicated they had a rudimentary under- 
standing of the technologies identified in the 
pre-course survey, none indicated that they felt 
able to command the tools to accomplish their 
own ideas and solutions. The survey, in fact, 
revealed low levels of exposure to most of the 
course content. Students reported little experi- 
ence with GIS at all, and the experience they 
had was limited to a handful of data types and 
rather turnkey operations. Both offerings of the 
course required the instructors to cover funda- 
mental concepts before moving on to a higher 
order agenda. These lessons included an intro- 
duction to databases and data formats, basic 
use of GIS and GPS tools, rudimentary visual- 
ization and analysis techniques, and metadata 
and presentation skills. The instructors decided 
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TABLE 1.1 


Results of the 2008 Post-Course Survey, on a 5-Point Likert Scale, of the 


Importance of Different Topics to the Course and to the Students’ Research (n = 5) 


Topic Importance to Course Importance to Research 
Databases 4.8 5.0 
Data formats 5.0 4.8 
Data gateways/ portals 4.6 4.6 
Introduction to GIS 4.8 4.8 
GIS analysis 5.0 5.0 
GIS data conversion 5.0 5.0 
Workflow management 4.6 4.6 
Metadata 5.0 5.0 
Statistics 4.6 4.4 
GPS 4.6 4.2 
Data visualization 5.0 5.0 
Ontologies 4.0 3.6 
Data preservation 4.2 4.2 


against using some technologies because, for 
example, students had no experience working 
in Unix/Linux systems or using low-level pro- 
gramming languages. 

Students indicated a high level of interest 
in all the topics covered in the class and had 
an appreciation for DIL skills. In the standard 
end-of-course evaluations to which all stu- 
dents (n = 12) responded, the course received 
an overall rating of 4.8 out of 5.0, and several 
students remarked that after taking the course 
they finally understood what they were doing 
and now could contribute new procedures for 
analyzing data to their research groups. Of the 
12 enrolled students, 5 completed the 2008 
post-course survey, with the results summa- 
rized in Table 1.1. 

The high level of interest in basic topics 
such as data formats and an introduction to 


databases indicate the relative lack of prepara- 
tion in the core technology skills necessary to 
work in an e-research environment. All but one 
topic (ontologies) received a rating of at least 
4.0 (very important) as important to research. 
In addition to extracting information from 
course surveys, the instructors also carefully 
examined student? completed course work 
to determine which concepts, skills, or ideas 
students still lacked. For example, the authors 
found that most students had ready access to 
the primary data used by their research groups 
and that these data often formed the basis for 
their semester project analysis. A focus of the 
course was on students abilities to identify 
and synthesize supplementary data, such topo- 
graphic, political, or land-use data to overlay on 
the data collected by the research group. Anal- 
ysis of the student semester projects indicated 


that students indeed could find, identify, and 
incorporate external data sources into their 
analysis and/or visualization. 

However, the analysis of the students’ se- 
mester projects from both years revealed re- 
curring shortcomings. While students did ap- 
ply external data appropriately to their work, 
frequently these data were not cited properly. 
Although students correctly documented tradi- 
tional published literature, they might not con- 
sider data to be a valid, citable scholarly source 
or have a clear understanding of how to cite a 
data set. 

Students also struggled to fully comprehend 
the importance and complexity of data sharing, 
though the course was geared toward pushing 
this point explicitly. The following issues ap- 
peared multiple times over the two separate 
semesters: 


1. Preservation/archiving. The students’ fi- 
nal task in 2008 was to submit their data 
to the GEON Portal (www.geongrid 
.org) for safekeeping and redistribution. 
In 2010, GEON was merely a sugges- 
tion and students were encouraged to 
identify a repository in their domain to 
which they could submit their project 
data. Although many students attempted 
these submissions in good faith (despite 
some technical difficulties with GEON 
both years), several students shared the 
sentiments of one in particular, who ar- 
gued that a department-run website that 
“everybody in the [domain] community 
knows about” was a better ultimate des- 
tination for their data than any more for- 
mal data repository. 

2. Metadata. Although the time allocated 
for metadata was limited, the instruc- 
tors managed to include the concepts of 
schema, authoritative terminology, XML, 
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indexing, and searchability. Each course 
offering had a metadata unit during 
which instructors introduced students to 
several proper examples of metadata. The 
students then completed a lab in which 
they wrote their own simple metadata 
documents. While some students did 
write good accompanying metadata for 
their final project materials, most did not. 
One deficit seemed to arise from students 
creating metadata from the perspective 
of “how I did it,” rather than striving to 
make the data more discoverable by the 
next scientist down the line. 
3. The technologies and workflows of data 
sharing. Students (despite instructor 
warnings) expected to accomplish far 
more than they were able during a single 
semester. This was an outcome of stu- 
dents’ expectations that, once analyzed, 
their data could be visualized fairly eas- 
ily and shared online. The complexity of 
building data-driven, interactive Web ap- 
plications was not apparent until it was 
too late. 


DISCUSSION 


The authors sought to triangulate the needs re- 
lated to DIL through interviews with research 
faculty and analysis of the results of our own 
geoinformatics-themed DIL course. We found 
a substantial amount of overlap between the 
needs identified: databases, metadata, data 
sharing, preservation and curation of data, and 
formatting and documentation of data. 

The assessments also uncovered differences 
that were more clearly a focus for one group 
than the other. For example, the interviews 
with faculty members primarily focused on 
data they created themselves, while a significant 
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portion of the Geoinformatics course involved 
locating data from external sources. An analysis 
of course work showed that students needed to 
learn “the basics” of much of information tech- 
nology, even before broaching data issues. Ad- 
ditionally, to manipulate the data, students had 
to learn how to use analysis and visualization 
tools, use workflow management tools, and 
develop a minimum computing background 
to take advantage of the available cyberinfra- 
structure. On the other hand, the production- 
and publication-focused faculty researchers 
described the need for data curation and man- 
agement, such as good versioning, documen- 
tation, and quality assurance and the merging 
of data. In addition, the faculty surfaced the 
concept of data ethics: when to share data, who 
owns data, and how to appropriately acknowl- 
edge data. To that extent, these two investi- 
gations provide complementary information 
about perceived DIL needs. 

We have argued that an understanding of 
either faculty or student practices and needs 
alone is insufficient to develop the founda- 
tional objectives necessary for a DIL program. 
Instead, both faculty and student perspectives 
must be understood and analyzed in tandem 
to inform a more complete understanding of 
what is needed in DIL. We now reintroduce 
another foundational component toward de- 
veloping objectives for a DIL program: the 
perspective of the librarian. The organization, 
description, dissemination, curation, and 
preservation of information resources, which 
increasingly includes research data, are the 
hallmark of librarians. Although DIL must be 
grounded in real-world needs as expressed by 
students and faculty, the librarian brings the 
broader perspective and a connection to the 
larger “information ecology” that exists beyond 
the single research project or classroom. This 


connection can ensure that holistic best prac- 
tices strengthen current practices. 


Comparison of Data Information 
Literacy With ACRL IL Standards 


To help articulate and ground our core DIL 
objectives, we found it useful to examine these 
topics through the prism of the ACRL (Associ- 
ation of College and Research Libraries) infor- 
mation literacy competency standards (2000), 
which have been widely adopted by many in- 
stitutions and accreditation agencies and guide 
many library instruction initiatives. To that 
end, the next section first lists the ACRL stan- 
dards, then briefly examines each standard for 
its relevance to these DIL objectives. 

One readily identifiable gap in applying the 
ACRL information literacy standards to a DIL 
program is the difference in focus. The ACRL 
standards focus on educating information con- 
sumers—people seeking information to satisfy 
an information need. Although faculty and 
students do consume research data, our analy- 
sis of faculty and students indicates a strong 
need to address their roles as data producers as 
well. Therefore, the underlying objectives for 
any DIL program need to accommodate both 
the data producer’s viewpoint as well as that of 
the data consumer. 

The ACRL standards state that information 
literate individuals are able to: 


1. Determine the extent of information 
need. 

2. Access needed information efficiently 
and effectively. 

3. Evaluate information and its sources crit- 
ically and incorporate selected informa- 
tion into one’s knowledge base and value 


system. 


4. Use information effectively to accom- 
plish a specific purpose. 

5. Understand the economic, legal, and so- 
cial issues surrounding the use of infor- 
mation, and access and use information 
ethically and legally. (ACRL, 2000, pp. 
2-3) 


ACRL Standard One: Determining Nature 
and Extent of Information Need 

When gathering information, one often skips 
the research question formulation stage that 
is the foundation of the information search 
process (Kuhlthau, 2004). However, without 
articulating and understanding the question 
deeply, one cannot arrive at a relevant an- 
swer. The instructors addressed this concept 
in the semester project for the Geoinformatics 
course—for example, the overall assignment 
asked students to identify their research ques- 
tion and determine what data they needed to 
address that question. In the case of geospatial 
data, students needed to determine whether 
to use raster or vector data, because each 
type has its own strengths and weaknesses for 
analysis and presentation. Thus, the authors’ 
curricular topic of databases and data formats 
fit best into this competency standard, as it 
is fundamental to understanding the nature 
of the information needed. In fact, Standard 
One already explicitly addresses data, stating 
that a student “realizes that information may 
need to be constructed with raw data from 
primary sources.” 

From the data producer’s standpoint, iden- 
tifying the nature and extent of the potential 
needs and uses of the data being generated 
provides the foundation for effectively sharing, 
reusing, curating, and preserving data. The cul- 
tural practices and norms of the producer's dis- 
cipline, including being aware of any existing 
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community resources, standards, or tools, in- 
form these data functions. 


ACRL Standard Two: Access 

Needed Information 

Efficiently and Effectively 

Students need to consult common disciplinary 
and general data repositories as well as under- 
stand the formats and services through which 
data can be accessed in order to access infor- 
mation efficiently and effectively. In the Geoin- 
formatics course, students investigated several 
data sources and were required to use external 
data extensively to supplement their own data. 
In addition to finding data relevant to their re- 
search question, the variety and complexity of 
data formats made the process of locating sup- 
plementary data challenging for the students. 
Several students needed assistance converting 
data from one format to another and under- 
standing how to merge data sets with different 
resolutions or timescales. 

Standard Two addresses these issues, as an 
information literate student “extracts, records, 
and manages the information and its sources,” 
including using “various technologies to man- 
age information selected and organized” 
(ACRL, 2000, pp. 10-11). Not only will 
DIL students need to know where data exist, 
but they also must harvest, convert, possibly 
merge, and ultimately feed it into analysis or 
visualization tools that may or may not require 
still other formats. Although a direct graft of 
classic information literacy competency stan- 
dards to DIL would focus on the process of 
bringing data into one’s research, as the faculty 
interviews revealed, these concepts are similar 
for publishing data to the world. Thus, DIL 
concepts related to this competency standard 
include data repositories, data conversion, data 
organization, sharing data, and interoperability. 


28 PARTI Making the Case for Data Information Literacy 


ACRL Standard Three: Evaluate 
Information Critically 

When evaluating data, students understand and 
critically evaluate the source. Students must de- 
termine whether the research group that pro- 
vided the data is known to be reliable and/or 
if the data repository or its members provide 
a level of quality control for its content. Us- 
ers also need to evaluate the data for relevancy 
and compatibility with their own research. As 
part of the quality assurance component of 
data evaluation, students need to evaluate as- 
sociated metadata. Among other attributes, 
metadata specifies the details of the experiment 
or data product, including the following: the 
conditions under which the data were collected 
or created; the apparatus or procedures used to 
generate the data; distribution information and 
access rights; and spatial and temporal resolu- 
tion, units, and parent sources. It is a vital tool 
in the evaluation of the quality and authority of 
the resource. While the ACRL standards would 
approach this from a data user perspective, the 
faculty interviewed made it clear that data pro- 
ducers need to provide quality assurance for 
data and metadata as well. 


ACRL Standard Four: Use Information 
to Accomplish a Specific Purpose 
In this standard, students carry out a project 
and need to “communicate the product or 
performance effectively to others.” As such, 
students should use a format and employ in- 
formation technologies that best support the 
purpose of the work. Here, in the expansive 
verb “communicate” and phrase “appropriate 
information technologies,” one can assume the 
concepts of data sharing, reuse, and curation, 
as well as connections to analysis and visualiza- 
tion tools. 

In addition, this standard includes the appli- 
cation of information toward the planning and 
creation of a product, revising the development 


process as appropriate along the way. These 
components parallel the statements made by 
faculty on the importance of documenting the 
processes used to develop research data (the 
“product” in this case). Researchers also iden- 
tified the careful management and organiza- 
tion of data as essential in enabling its eventual 
transfer “from their original locations and for- 
mats to a new context” (as stated in Standard 
Four) for internal use by others in the project, 
or for reuse by others. 


ACRL Standard Five: Understand 
Economic, Legal, and Social Issues 

and Use Information Ethically 

Data ethics are certainly an important compo- 
nent ofa well-rounded DIL program, especially 
since intellectual property issues concerning 
data are much less defined than, for example, 
those concerning traditional textual works. 
Students need to not only determine when and 
how to share data, which varies among disci- 
plines, but also document their own sources 
of data. We found students struggled with the 
latter in the Geoinformatics course, as exhib- 
ited primarily by a failure to acknowledge those 
parties responsible for the data they consumed 
and reused. The ethical issues surrounding stu- 
dents as data producers and publishers, a con- 
cern raised by research faculty, appears to be 
entirely absent from the ACRL standards and 
would be a largely novel component of a DIL 
curriculum. 


CORE COMPETENCIES FOR DATA 
INFORMATION LITERACY 


With information gleaned from the faculty 
interviews, the Geoinformatics course, and 
the ACRL Information Literacy Competency 
Standards, the authors propose the following 


educational objectives for a DIL program. Dis- 
ciplinary implementation of these outcomes 
would naturally incorporate technologies or 
techniques specific to that discipline. The fol- 
lowing are the proposed core competencies, 
organized by major theme. 


Introduction to databases and data formats. 
Understands the concept of relational 
databases and how to query those data- 
bases, and becomes familiar with stan- 
dard data formats and types for the dis- 
cipline. Understands which formats and 
data types are appropriate for different 
research questions. 

Discovery and acquisition of data. Locates 
and utilizes disciplinary data reposito- 
ries. Identifies appropriate data sources 
and can import data and convert it 
when necessary, so that it can be used 
by downstream processing tools. 

Data management and organization. Under- 
stands the life cycle of data, develops 
DMPs, and records the relationship of 
subsets or processed data to the origi- 
nal data sets. Creates standard operating 
procedures for data management and 
documentation. 

Data conversion and interoperability. Profi- 
cient in migrating data from one format 
to another. Understands the risks and 
potential loss or corruption of informa- 
tion caused by changing data formats. 
Understands the benefits of making 
data available in standard formats to fa- 
cilitate downstream use. 

Quality assurance. Recognizes and resolves 
any apparent artifacts, incompletion, or 
corruption of data sets. Utilizes meta- 
data to anticipate potential problems 
with data sets. 

Metadata. Understands the rationale for 
metadata and proficiently annotates and 
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describes data so it can be understood 
and used by members of the work group 
and external users. Develops the ability 
to read and interpret metadata from ex- 
ternal disciplinary sources. Understands 
the structure and purpose of ontologies 
in facilitating better sharing of data. 


Data curation and reuse. Recognizes that data 


may have value beyond the original pur- 
pose, (i.e., to validate research or for use 
by others). Understands that curating 
data is a complex, often costly endeavor 
that is nonetheless vital to community- 
driven e-research. Recognizes that data 
must be prepared for its eventual cura- 
tion at its creation and throughout its 
life cycle. Articulates the planning and 
actions needed to enable data curation. 


Cultures of practice. Recognizes the practices, 


values, and norms of the chosen field, 
discipline, or subdiscipline as they relate 
to managing, sharing, curating, and pre- 
serving data. Recognizes relevant data 
standards of a field (metadata, quality, 
formatting, and so forth) and under- 
stands how these standards are applied. 


Data preservation. Recognizes the benefits 


and costs of data preservation. Under- 
stands the technology, resource, and or- 
ganizational components of preserving 
data. Utilizes best practices in preserva- 
tion appropriate to the value and repro- 


ducibility of data. 


Data analysis. Becomes familiar with the ba- 


sic analysis tools of the discipline. Uses 
appropriate workflow management 
tools to automate repetitive analysis of 
data. 


Data visualization. Proficiently uses basic 


visualization tools of the discipline and 
avoids misleading or ambiguous repre- 
sentations. Understands the advantages 
of different types of visualization—for 
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example, maps, graphs, animations, or 
videos—for different purposes. 

Ethics, including citation of data. Under- 
stands intellectual property, privacy, 
and confidentiality issues and the ethos 
of the discipline related to sharing data. 
Appropriately acknowledges data from 


external sources. 


The authors compared the DIL core objec- 
tives with the course syllabus from the Sci- 
ence Data Literacy curriculum of Qin and 
D’Ignazio (2010) and found similarities be- 
tween the two formulations. The chief differ- 
ence appeared to be the depth of treatment of 
different topics. While the SDL course concen- 
trated on metadata, for example, our approach 
focuses as much on the consumption of data 
(tools) as it does on documenting and annotat- 
ing data. The Geoinformatics course perhaps 
had too little coverage of metadata, but we 
found that students and faculty both needed 
as much help with data manipulation as they 
did with enhancing the documentation of their 
data. Naturally, instructors must balance using 
tools and creating interoperable infrastructure 
in teaching this type of course. 

We have alluded to the notion that a com- 
prehensive DIL program may not be entirely 
the responsibility of librarians. However, li- 
brarians who have the skills to teach database 
management and data analysis, for example, 
could teach those concepts. Indeed, learning 
those skills supports the educational mission 
of the university. However, the authors rec- 
ommend collaboration between disciplinary 
faculty and librarians as the best practice 
for teaching DIL skills. DIL needs to be 
grounded in the culture of the discipline in 
which it is embedded, and also imbued with 
the greater, communal perspective possessed 
by a librarian. 


CONCLUSION 


Thirty years ago, it was good laboratory prac- 
tice [that] you had a bound paper manual, you 
took good notes, you took fifteen or twenty 
data points, maybe a hundred, and you had 
a nice little lab book. But weve scaled now 
to getting this mega amount of information 
and we haven't scaled our laboratory manage- 
ment practices. . . . It makes perfect sense to 
me that... you get this [data management 
skills] in people’s consciousness, make them 
aware it’s a problem early on in their careers 
as graduate students, before they go on and 
do all the other things and get too set in their 
ways....And... that takes a fair amount of 


education . . . and training. (Civil Engineer) 


The authors uncovered a growing need 
among research faculty and students for DIL 
skills. As a result, the authors brought together 
data from different audiences to propose a suite 
of core DIL skills that future e-researchers need 
to fully actualize the promise of the evolving 
cyberinfrastructure. 

DIL represents an opportunity to expand 
information literacy from the library into the 
laboratory. In much the same way that libraries’ 
information literacy programs have gone be- 
yond the “one-size-fits-all” approach, librarians 
will need to go beyond a “one-size-fits-all” ap- 
proach to data management and curation liter- 
acy. The Data Curation Profiles project (Cragin, 
Palmer, Carlson, & Witt, 2010; Witt, Carlson, 
Brandt, & Cragin, 2009) indicated that differ- 
ent disciplines and subdisciplines have different 
norms and practices for conducting their re- 
search and working with data. These differences 
are manifest in the myriad ways they manage 
(or dont manage), share (or don’t share), cu- 
rate, and preserve their research data. While we 
have provided a general summary of common 


themes from these interviews, we understand 
that any DIL program focused on a specific 
discipline needs to identify, incorporate, and 
address these specific differences in the curricu- 
lum. Models will help ascertain the educational 
needs of subdisciplines with regard to their data 
and then design DIL programs that will address 
these needs. These results serve to start a con- 
versation and propose general concepts, rather 
than to provide a final, detailed curriculum. 
Upon examination of the ACRL standards 
for information literacy, it is clear that DIL falls 
within the scope of standard library practice. 
The conceptual overlap between the ACRL 
standards and the DIL objectives indicates that 
these skills are very much aligned with librari- 
anship. With some exceptions, the ACRL stan- 
dards are written generally enough to accom- 
modate DIL skills, and indeed the standards do 
have several specific outcomes related to data. 
Still, given the ballooning interest in data man- 
agement for e-research, the new iteration of the 
standards should incorporate more data-related 
outcomes, especially from the perspective of 
the user as publisher of information. 
Additional research should be done to iden- 
tify the skill sets librarians need to support the 
DIL objectives, either as stated here or as they 
develop in practice. This will not only speed the 
development of DIL curricula, but also push 
the library community to work to adapt the 
collective DIL practice to trends in e-research. 
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NOTE 


This chapter was originally published in 2011 
(Carlson, J., Fosmire, M., Miller, C., & Sapp 
Nelson, M. [2011]. Determining data infor- 
mation literacy needs: A study of students 
and research faculty. portal: Libraries and 
the Academy, 11[2], 629-657. http://dx.doi. 
org/10.1353/pla.2011.0022) as an article de- 
scribing the needs assessment research done 
by the Purdue University Libraries. The au- 
thors’ articulation of DIL competencies and 
how they relate to information literacy served 
as the springboard for the Data Information 
Literacy project. The editors feel that this 
work serves as an important milestone in DIL 
history, and as such it is reprinted here with 
minimal revision. 
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INTRODUCTION 


In Chapter 1 we described the foundational 
research that generated an early articulation of 
data information literacy (DIL) and the result- 
ing 12 DIL competencies. The next step was to 
explore how our conceptions of DIL could be 
applied in practice. To do this we developed a 
3-year Institute of Museum and Library Ser- 
vices (IMLS)—funded study to further the DIL 
concept and to create and implement educa- 
tional programs for graduate students in sci- 
ence, technology, engineering, and mathemat- 
ics (STEM). The purpose of the project was 
to answer two overarching questions. First, 
what data management and curation skills are 
needed by future scientists to fulfill their pro- 
fessional responsibilities and take advantage of 
collaborative research opportunities in data- 
driven research environments? Second, how 
can academic librarians apply their expertise in 
information retrieval, organization, dissemina- 
tion, and preservation to teaching these skills? 
This chapter explains the methods and ap- 
proaches that we used in the Data Information 
Literacy project. 


KEY ASSUMPTIONS OF THE DATA 
INFORMATION LITERACY PROJECT 


Before describing the methodology of the DIL 
project in detail, we must begin by listing our 
key assumptions for this project. These as- 
sumptions served as our guiding principles in 
developing and carrying out our work. They 
are that (a) information literacy is a foundation 
for DIL; (b) graduate students are a receptive 
audience for DIL programs; (c) librarians are 
in a prime position to teach DIL skills; (d) the 
need for DIL programs has not been fully 


documented; and finally, (e) to meet this need 
successfully, librarians must align with disci- 
plinary cultures and local practices. 


Information Literacy as a Foundation 
for Data Information Literacy 


One of the key assumptions that we made in 
developing the DIL project was that we should 
take advantage of librarians’ experiences and 
long, well-documented history with informa- 
tion literacy (Rader, 2002). We deliberately 
named our project “Data Information Liter- 
acy” rather than simply “Data Literacy” for two 
reasons. First, we wanted to recognize that the 
library and education communities have in- 
vested a great deal of time and energy in under- 
standing how students learn to acquire, evalu- 
ate, and use information; this investment was 
certainly relevant in exploring how students 
develop, manage, and curate research data. In- 
formation literacy has a long history of explor- 
ing, assessing, and transforming instructional 
models and strategies to ensure their relevancy 
to particular situations and environments. Ex- 
plorations in information literacy have been 
conducted at a broad scale to make sure the 
frameworks are in sync with the aims of higher 
education (Pausch & Popp, 2000) or to align 
with advances in technologies, societal norms, 
and learning theories (Martin, 2013). Others 
are more tightly focused on particular models 
such as embedded librarianship (Kvenild & 
Calkins, 2011) or offering instruction in an 
online environment (Hahn, 2012). Data as a 
type of information have distinctions and idio- 
syncrasies that merit special consideration, but 
we believed the information literacy field could 
provide a solid foundation for our work. 
Second, DIL is an area in which librar- 
ians can make important contributions. How- 
ever, teaching students information literacy 


competencies in relation to working with re- 
search data may seem daunting to many librar- 
ians. By directly connecting work with data to 
a familiar and accepted area (e.g., information 
literacy), we hope to encourage more librar- 
ians to take action to develop DIL programs of 
their own. We believe that DIL is a logical out- 
growth of information literacy and therefore 
expanding the scope of information literacy to 
include data management and curation is a log- 
ical extension of information literacy concepts. 
There are a number of other initiatives that 
affirm our approach to linking data and infor- 
mation literacy. The Society of College, Na- 
tional and University Libraries (SCONUL) 
Seven Pillars of Information Literacy model 
(SCONUL Working Group on Information 
Literacy, 2011), and the Researcher Develop- 
ment Framework by Vitae (2014), a UK-based 
nonprofit organization, each incorporate data 
management skills into their definitions of infor- 
mation literacy and support holistic approaches 
to helping doctoral candidates acquire skills and 
knowledge in data management. A report from 
the Research Information Network (Goldstein, 
2011) argued that a broader interpretation of 
information literacy is needed—one that recog- 
nizes research data as information—to ensure 
that students gain the skills they will need to 
be successful in their careers. The 2012 LIBER 
working group on e-science selected research 
data as a critical area for involvement by li- 
braries in e-science support and recommended 
that libraries assist faculty with the integra- 
tion of data management into the curriculum 
(Christensen-Dalsgaard et al., 2012). 


Graduate Students as a 
Receptive Audience 


Another key assumption was the immediate 
benefit that graduate-level students may gain 
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from building their skill sets in DIL concepts. 
For example, in the STEM disciplines, graduate 
students carry out the data management tasks 
for their own research, and frequently partici- 
pate in data activities to support lab/team proj- 
ects as well (Akmon, Zimmerman, Daniels, & 
Hedstrom, 2011; Westra, 2010). But Gabridge 
(2009, p. 17) observed that graduate students 
composed “a constantly revolving community 
of students who arrive with . . . uneven skills in 
data management.” 

Graduate students participate in varying lev- 
els of mentoring or apprenticeship. However, 
research data skill and competency develop- 
ment focuses on more traditional skills such as 
research design, equipment use, data analysis, 
and problem solving in the laboratory or field 
setting rather than those addressed by the DIL 
competencies (Feldman, Divoll, & Rogan- 
Klyve, 2013; Leon-Beck & Dodick, 2012). 
Furthermore, the process through which nov- 
ice researchers acquire these skills may be in- 
fluenced by social and cultural factors in their 
research teams or communities of practice 
(Feldman et al., 2013). Therefore, acquisition 
of DIL competencies by graduate students ap- 
pears to be uneven at best. 

When thinking about target audiences for 
DIL training, it is essential to evaluate the local 
landscape. Researchers appreciate training that 
has an immediate impact on their particular 
disciplinary setting; training which lacks this 
will be ignored by graduate students (Molloy 
& Snow, 2012). Interviews, surveys, and post- 
training feedback can help libraries confirm the 
types of research services which may be of in- 
terest and beneficial to graduate students and 
research faculty (Bresnahan & Johnson, 2013; 
Johnson, Butler, & Johnston, 2012). Finding 
the best approach to target graduate students 
with training was a major component of the 
DIL project. 
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Knowing that graduate students were a 
prime audience, the next question was: How 
and when could we engage this audience? 
There are a number of pathways by which 
training can be provided to future scientists. 
For example, graduate students may be intro- 
duced to basic data management concepts via a 
data management module in “responsible con- 
duct of research” training (Frugoli, Etgen, & 
Kuhar, 2010). This may lead to other consul- 
tations and training opportunities. Institutions 
are also embedding training in other required 
courses and programs. While it is important to 
provide early training (Molloy & Snow, 2012), 
significant gains may be achieved by engaging 
students when they are grappling with issues 
in their own practices (Scott, Boardman, Reed, 
& Cox, 2013). Most students are interested in 
training with a strong component of imme- 
diacy and practical application (Byatt, Scott, 
Beale, Cox, & White, 2013; Parsons, 2013). 


Librarians Are in an Excellent Position to 
Teach Data Information Literacy Skills 


Librarians are in a unique position to teach 
DIL in academic environments. Graduate-level 
courses with a librarian embedded within them 
have been linked to improved student learn- 
ing (Kumar & Edwards, 2013; Kumar, Wu, 
& Reynolds, 2014), and informationists have 
been successful in deploying services to gradu- 
ate students and research teams (Hoffmann & 
Wallace, 2013; Polger, 2010). However, sur- 
veys conducted on data management show 
that very few students consult with a librarian 
on research data management (RDM) issues 
(Doucette & Fyfe, 2013). A Research Infor- 
mation Network (RIN) initiative applied the 
SCONUL Seven Pillars model of informa- 
tion literacy and Vitae’s Research Develop- 
ment Framework to the development of data 


management skills in postgraduate courses 
in the United Kingdom. The results demon- 
strated that a wide range of disciplines need 
data management skills and that core skills 
as well as discipline-specific training should 
be embedded into the postgraduate curricula 
(Goldstein, 2010). These findings indicate an 
opportunity for librarians to engage graduate 
students about the issues they face in working 
with research data. 


Demand for Data Information Literacy 
Programs Needs Further Exploration 


The approaches to teaching data management 
and curation for graduate students in the sci- 
ences are either stand-alone courses or programs 
or one-shot workshops. The stand-alone course 
approach has been used by several schools of in- 
formation science, including Syracuse Univer- 
sity (Qin & D’Ignazio, 2010), the University of 
Michigan (n.d.), and the Rensselaer Polytechnic 
Institute. Syracuse designed a course to teach 
science data literacy, defined as “the ability to 
understand, use, and manage science data” (Qin 
& D’Ignazio, 2010, p. 3), with a focus on pre- 
paring students for employment in science or as 
data management professionals. The University 
of Michigan developed a research fellowship 
program, Open Data, centered on building a 
community of practice around managing, shar- 
ing, and reusing scientific data. The curriculum 
includes a core course on data curation and 
elective courses from multiple disciplines. The 
Tetherless World Constellation (n.d.) research 
center at Rensselaer Polytechnic Institute offers 
a course in “data science” for graduate students 
that includes metadata, discovery, workflow 
management, data analysis, and data mining. 
One advantage of the stand-alone approach 
to teaching data skills is the depth of coverage. 
However, it may be difficult for students to 


commit to a course, especially if the course is 
outside of their discipline. 

Becoming prevalent at academic institu- 
tions, “one-shot” workshops represent a second 
approach to data management and curation 
education. Many of these workshops, such 
as those offered by MIT (Graham, McNeill, 
& Stout, 2011) and the University of Min- 
nesota (Johnston, Lafferty, & Petsan, 2012), 
help faculty and graduate students address 
requirements for data management plans by 
funding agencies. Other workshops cover data 
management as one component of a broader 
training in research ethics or responsible con- 
duct of research, as required by the National 
Science Foundation and the National Institutes 
of Health (Coulehan & Wells, 2006; Frugoli 
et al., 2010). Workshops require less of a time 
commitment and are likely to reach more peo- 
ple, but they cannot provide as much breadth 
or depth. 

As the need for students who are capable 
of managing and curating data sets continues 
to expand, we are seeing the development of 
alternative methods. In some cases, online 
and print materials provide guidance on core 
data management practices. For instance, the 
Australian National University created a Data 
Management Manual that is now in its eighth 
edition. The university’s Information Literacy 
Program uses this manual as a resource for 
teaching graduate students (Australian Na- 
tional University, 2013). Other programs have 
taken a multi-tier approach, providing semi- 
nars, lectures, and workshops; integrating data 
management into research professional devel- 
opment courses; and incorporating discipline- 
specific content for particular audiences (Byatt 
et al., 2013). 

The DIL project was a means for exploring 
the strengths and weaknesses of different ap- 
proaches in educating students about the data 
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concepts they would need to be successful in 
their careers. We explored a number of possi- 
bilities for developing and delivering effective 
educational programs. Similarly, we recognized 
that DIL programs would be shaped by educa- 
tional objectives and constraints due to time, 
circumstances, and resources. Comparing mul- 
tiple approaches to developing and implement- 
ing DIL programs helps with identifying com- 
mon themes and differences across approaches. 


Alignment With Disciplinary 
Cultures and Local Practices 


Perhaps our most important assumption in 
developing and implementing the DIL project 
was that its success depended on our ability 
to understand and align with existing cultures 
of practice. We recognized that a DIL edu- 
cation program would cause the students to 
change the processes and workflows that they 
had learned previously. This deviation could 
potentially affect others who depended on the 
students’ work. We wanted to ensure that the 
DIL project would have a positive effect, not 
just for the students, but for the faculty and 
others in the lab. We needed to understand 
not only the current practices of the students 
but also faculty perceptions and reactions to 
the 12 DIL competencies that we had devel- 
oped. If the faculty or the students saw little 
value in a particular competency then there 
was no point in including it in a DIL program 
(at least initially). 

In addition to local practices, we needed 
to incorporate the perspectives and resources 
of the disciplines. Each of the disciplines with 
which we worked had articulated its own set 
of values, beliefs, and practices with regard to 
working with research data. Our DIL programs 
had to be informed by these disciplinary con- 
cepts to have the desired impact. 
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The need to take context into consideration 
in developing educational programming has 
received attention in information literacy re- 
search. Librarians have largely embraced infor- 
mation literacy as one of their core missions; 
however, Lloyd and Williamson (2008) argued 
that conceptions of information literacy that 
have come out of the library and information 
science fields are too narrow. Recognizing in- 
formation skills as a part of sociocultural prac- 
tices within broader contexts enables practitio- 
ners to better understand how people engage 
with information in ways that are meaningful 
to them (Lloyd, 2010). Hoyer (2011) also ar- 
gued for moving away from a generic skills- 
based conception of information literacy and 
toward a framework that goes beyond the 
academic sector into the workplace and other 
arenas. As the social interactions and relation- 
ships within the workplace are factors in how 
information is accessed, evaluated, and used in 
workplace environments, social context ought 
to be accounted for in how information liter- 
acy is taught to students. 

The idea that curation specialists need to un- 
derstand the nuances and disciplinary practices 
of the research communities they serve is also 
taking root (Martinez-Uribe & Macdonald, 
2009; Molloy & Snow, 2012). This is extend- 
ing into education in data management and 
curation as well. Several initiatives in data man- 
agement and curation education are taking this 
approach. The Research Data MANTRA proj- 
ect at the University of Edinburgh developed 
online programs based on needs assessments 
from postgraduate programs in social science, 
clinical psychology, and geoscience (EDINA, 
n.d.). The University of Massachusetts Medi- 
cal School and Worcester Polytechnic Institute 
developed Frameworks for a Data Management 
Curriculum for teaching research data manage- 
ment to undergraduate- and graduate-level 


students in the sciences, health sciences, and 
engineering disciplines (Piorun et al., 2012). 

In some cases, training can leverage mate- 
rials created within certain research domains 
to promulgate RDM best practices, tools, and 
resources. For instance, ecologists and evolu- 
tionary biologists can find a number of articles 
about basic practices they can take to improve 
data sharing and reproducibility (Borer, Sea- 
bloom, Jones, & Schildhauer, 2009; Dryad, 
2014; White et al., 2013). Disciplinary frame- 
works may be useful for synthesizing a guid- 
ance document, such as the Principles for Engi- 
neering Research Data Management created by 
the University of Bath (Darlington, Ball, How- 
ard, Culley, & McMahon, 2010). 


DEVELOPING THE DIL PROJECT 


To address our goals of better understanding 
what data management and curation skills are 
needed by graduate students in science and 
engineering disciplines, and more specifically, 
what roles libraries and information science 
professionals could play in addressing these 
skills, we developed the DIL project. If we 
were successful in answering these two ques- 
tions, then the DIL project could take the next 
steps of testing an approach for library-run ed- 
ucation for DIL skills. We ultimately strove to 
build a case for models that academic libraries 
could implement for their own curricula and 
programming by designing and implementing 
case studies of DIL programs. Through our 
experiences and assessment of these programs, 
we would then move beyond the unique, in- 
dividual needs of our home institutions and 
attempt to create a dialog of these experiences 
at the community level in order to address 
data management and curation issues more 


broadly. Our findings presented in Chapter 3 


and the case studies in Chapters 4 through 8 
describe our work toward meeting these ambi- 
tious goals. 

The DIL project got its start by recruiting 
an initial cohort of librarians to partner with 
and create a series of educational programs. 
These librarians, the five DIL project teams il- 
lustrated in Table 2.1, developed expertise in 
this area through following a shared method- 
ological framework. Reviewing the process and 
outcomes of the our five case study findings, 
we then created a guide for developing DIL 
programs (Chapter 9) comprising the materi- 
als and resources we created or applied, along 
with a detailed description of the construction 
and implementation of each of the educational 
programs that were created. In addition, we 
analyzed our work and experiences collectively 
to identify commons themes or challenges, as 
well important differences, to generate a guide 
for others seeking to develop their own DIL 
programs. Our intent in producing a guide for 
developing DIL programs and in sharing the 
materials we developed was to have them serve 
as resources for librarians and as a catalyst for 
creating a community of practice. 


Structure of the Project 


To carry out the DIL project we recruited li- 
brarians to form five project teams based at 
four different locations: two at Purdue Uni- 
versity and one each at Cornell University, the 
University of Minnesota, and the University 
of Oregon. We recognized that a diverse set of 
perspectives and skill sets would be required 
to ensure the success of each project team and 
so each team was composed of three people: a 
data librarian, a subject librarian or informa- 
tion literacy librarian, and a faculty researcher 
from a science or engineering discipline. The 
data librarians applied their knowledge of data 
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management and handling and data curation 
standards and best practices to inform a DIL 
program for the project team. The subject spe- 
cialist librarians brought their knowledge of 
the information ecologies of the particular dis- 
ciplines they served to ensure that their DIL 
program would be relevant to the specific disci- 
plinary needs. On two of the project teams the 
data librarian and subject specialist roles were 
represented in one person, given the nature 
of their job responsibilities. On these teams, 
we recruited a librarian with knowledge and 
expertise in information literacy to serve as 
a resource in developing the team’s DIL pro- 
gram. The information literacy experts on the 
project also served as resources to the DIL proj- 
ect as a whole and were invaluable in shaping 
the overall direction of the project. The third 
team member, a faculty researcher, contributed 
to the team’s understanding of their research 
community standards and practices in working 
with data. They allowed their research group to 
be interviewed and observed, and were inter- 
viewed themselves to enable us to obtain this 
understanding. In addition, they collaborated 
with their project team on the construction 
and deployment of the educational programs 
for their students. We believed that having a 
direct connection with a faculty researcher was 
essential to ensure that the resulting DIL pro- 
gram was directly relevant to their students. 
The five DIL teams in this project are outlined 
in Table 2.1. 


IMPLEMENTING THE DATA 
INFORMATION LITERACY PROJECT 


Our proposal to carry out the DIL project was 
awarded by the IMLS in October of 2011. The 
project was implemented in five stages: 
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TABLE 2.1 


The Five DIL Project Teams and Their Composition 


Institution Discipline 


Purdue University Electrical and computer 


engineering 


Purdue University Agricultural and 


biological engineering 
Cornell University Natural resources 
University of Minnesota Civil engineering 


Ecology/landscape 
architecture 


University of Oregon 


1. Conducting an environmental scan and 
literature review 

2. Interviewing faculty and students 

3. Creating the DIL program 

4. Teaching the DIL program 

5. Assessing its impact 


The details of the work performed by each 
of the project teams in developing and imple- 
menting their individual DIL programs are 
in the case studies presented in Chapters 4 
through 8. 


Conducting an Environmental 
Scan and Literature Review 


Each of the five teams identified disciplinary 
resources and perspectives by conducting an 
environmental scan of the scholarly literature, 
reports, and other material produced by re- 
searchers in the discipline and subdiscipline of 
their faculty partner for information pertain- 
ing to the DIL competencies. Each team per- 
formed an environmental scan of existing data 
repositories, digital libraries, metadata schema, 


Subject Librarian/ 
Information Literacy 
Specialist 


Data Librarian 


Jake Carlson Megan Sapp Nelson 


Marianne Bracke Michael Fosmire 


Sarah Wright Camille Andrews 


Lisa R. Johnston Jon Jeffryes 


Brian Westra Dean Walton 


and other resources, standards, and best prac- 
tices for their discipline or subdiscipline. They 
shared and discussed results of the literature re- 
view and environmental scan to identify com- 
mon themes. 


Interviews of Faculty and Students 


The next stage was to conduct interviews with 
our faculty partners and graduate students. 
These interviews were question-based using a 
script and workbook; however, interactive ele- 
ments were incorporated when possible, allow- 
ing the interviewers and interviewees to share 
stories and ask questions (Ellis, 2008). We had 
two objectives in conducting the interviews. 
First, we wanted to gain an understanding of 
current practices with regard to handling, man- 
aging, and curating data in the labs of our fac- 
ulty partners. In addition to getting a sense of 
the kinds of data being generated in the lab, we 
sought to better understand local policies and 
practices with data. In particular we wanted to 
understand where and how graduate students 
acquired their knowledge and skills in working 


with data and how effective they were in doing 
so. Second, we wanted to gain an understanding 
of the educational needs of graduate students 
with regard to data from the perspective of the 
faculty and the graduate students. We sought 
to obtain this understanding through applying 
the 12 DIL competencies that we had genera- 
ted from previous research (see Chapter 1) and 
asking our interviewees to review and react to 
them. In developing the interview protocol, we 
revisited our initial conceptions of the 12 DIL 
competencies and revised them both to stream- 
line them and to ensure adequate coverage to 
potential areas of coverage for our educational 
programs. 

Our belief, which was later confirmed in the 
literature reviews and environmental scans, was 
that individual disciplines would have unique 
interpretations, perspectives, and motivations 
surrounding the management, dissemina- 
tion, and curation of data. In the interviews, 
we asked faculty and students to use a 5-point 
Likert scale to indicate how important they felt 
it was for graduate students to acquire each 
of these competencies before they graduated. 
We then followed up with several questions to 
learn why they assigned each competency the 
rating they did. 

We also believed that faculty and students 
would have their own terminologies and defi- 
nitions for the concepts and activities that en- 
compassed research data from their disciplinary 
practices, which may vary from the terms and 
definitions used by library science and informa- 
tion professionals. These two factors made it 
difficult, if not impossible, for us to craft defini- 
tions for each of the 12 competencies. For ex- 
ample, there is yet to be a universally recognized 
definition for data quality that would be under- 
stood by everyone we intended to interview. In 
fact, having such firm definitions would have 
been counterproductive for our purposes. We 
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wanted the fac- Our interview instruments are 


ulty and students available for download at 
to provide us http://dx.doi.org/10.5703 
with their per- /1288284315510. 


spectives on the 

knowledge and skills that were important to 
them and to their discipline. Asking them to 
react to a definition as articulated by librarians 
could have resulted in responses with limited 
value in informing educational programming 
for that discipline. Ultimately, we viewed the 
12 DIL competencies as starting points for a 
broader conversation between the librarians on 
the DIL project and the faculty and students. 

Instead of attempting to craft authoritative 
and universal definitions of the competencies, 
we listed particular skills or abilities that could 
be included as a component of the competency. 
We invited the interviewees to suggest other 
skills that they would consider to fall under each 
of the competencies. Although this led to some 
overlapping discussions, this approach enabled 
us to gain a more thorough and nuanced un- 
derstanding of faculty and student perspectives. 
The 12 data competencies and the skills that we 
associated with each of them for the purposes of 
the interview are listed in Table 2.2. 

The interview protocol was based on the 
structure of the Data Curation Profiles Tool- 
kit developed at Purdue University (http:// 
datacurationprofiles.org). It consisted of an 
interview worksheet, with questions for the 
interviewee to complete in writing during the 
interview, and an interviewer’s manual, which 
contained follow-up questions for the inter- 
viewer to ask based on the written responses of 
the interviewee. Our interview instruments are 
are available for download at http://dx.doi.org 
/10.5703/1288284315510. 

The interviews were conducted in the 
spring and summer of 2012. Eight of the 
interviews were with faculty. The other 17 
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TABLE 2.2 The 12 DIL Competencies and the Skills Used to Associate With Each 


Competency for the DIL Project Interviews 


Cultures of practice 


Data conversion and 
interoperability 


Data curation and 
reuse 


Data management 
and organization 


Data preservation 


Data processing and 
analysis 


Data quality and 
documentation 


Recognizes the practices, values, and norms of field, discipline, or subdiscipline as 
they relate to managing, sharing, curating, and preserving data 

Recognizes relevant data standards of field (e.g., metadata, quality, formatting) 
and understands how these standards are applied 


ls proficient in migrating data from one format to another 

Understands the risks and potential loss or corruption of information caused by 
changing data formats 

Understands the benefits of making data available in standard formats to facilitate 
downstream use 


Recognizes that data may have value beyond the original purpose, to validate 
research, or for use by others 

Is able to distinguish which elements of a data set are likely to have future value 
for self and for others 

Understands that curating data is a complex, often costly endeavor that is 
nonetheless vital to community-driven e-research 

Recognizes that data must be prepared for its eventual curation at its creation and 
throughout its life cycle 

Articulates the planning and activities needed to enable data curation, both 
generally and within his or her local practice 

Understands how to cite data as well as how to make data citable 


Understands the life cycle of data, develops data management plans, and keeps 
track of the relation of subsets or processed data to the original data sets 
Creates standard operating procedures for data management and documentation 


Recognizes the benefits and costs of data preservation 

Understands the technology, resources, and organizational components of 
preserving data 

Utilizes best practices in preparing data for its eventual preservation during its 
active life cycle 

Articulates the potential long-term value of own data for self or others and is able 
to determine an appropriate preservation time frame 

Understands the need to develop preservation policies and is able to identify the 
core elements of such policies 


Is familiar with the basic data processing and analysis tools and techniques of the 
discipline or research area 

Understands the effect that these tools may have on the data 

a pei workflow management tools to automate repetitive analysis 
of data 


Recognizes, documents, and resolves any apparent artifacts, incompletion, or 
corruption of data 

Utilizes metadata to facilitate an understanding of potential problems with 
data sets 

Documents data sufficiently to enable reproduction of research results and data 
by others 

Tracks data provenance and clearly delineates and denotes versions of a data set 


Continued 
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TABLE 2.2 The 12 DIL Competencies and the Skills Used to Associate With Each 


Competency for the DIL Project Interviews—cont'd 


Data visualization 
and representation 
charts, and diagrams 


Proficiently uses basic visualization tools of discipline 
Avoids misleading or ambiguous representations when presenting data in tables, 


Chooses the appropriate type of visualization, such as maps, graphs, animations, 
or videos, based on an understanding of the reason/purpose for visualizing or 


displaying data 


Databases and 


data formats databases 


Understands the concept of relational databases and how to query those 


Becomes familiar with standard data formats and types for the discipline 
Understands which formats and data types are appropriate for different research 


questions 


Discovery and 
acquisition of data 


Locates and utilizes disciplinary data repositories 
Evaluates the quality of the data available from external sources 


Not only identifies appropriate external data sources, but also imports data and 
converts it when necessary, so it can be used locally 


Metadata and 
data description 


Understands the rationale for metadata and proficiently annotates and describes 
data so it can be understood and used by self and others 


Develops the ability to read and interpret metadata from external disciplinary 


sources 


Understands the structure and purpose of ontologies in facilitating better sharing 


of data 


Ethics and attribution 


Develops an understanding of intellectual property, privacy and confidentiality 


issues, and the ethos of the discipline when it comes to sharing and 


administering data 


Acknowledges data from external sources appropriately 
Avoids misleading or ambiguous representations when presenting data 


interviews were with current or former grad- 
uate students or postdocs of the interviewed 
faculty, or in one case with a lab technician. 
Each DIL project team compiled and ana- 
lyzed its own ratings and responses to inform 
the development of its program. Each team 
wrote a summary of results and shared it with 
other members of the DIL project at an in- 
person project meeting. The overall findings 
for each of the 12 competencies are reported 
in Chapter 3. 

With what was learned from the environ- 
mental scan and the interviews, each team 


developed a DIL program that included de- 
fined learning goals, educational interven- 
tions, and metrics for assessment. In addition 
to crafting the content of their DIL program, 
each team negotiated an approach for deliv- 
ering the content with their faculty partners, 
as shown in Table 2.3. The approach selected 
by each team depended on a number of fac- 
tors, including existing norms and structures 
of the lab, the amount of time the faculty and 
students had available to accommodate a DIL 
program, and available resources to support 
the program. 
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TABLE 2.3 Approaches for Delivering a DIL Program Taken by the Five DIL Teams 


Institution Discipline 


Purdue University engineer 


Purdue University i a 


Cornell University Natural resources 


University of Minnesota Civil engineering 


University of Oregon 


Each of the project teams delivered their 
educational program in the fall of 2012, with 
the exception of the project team at Cornell, 
which delivered their program in the spring 
of 2013. The team members recorded their 
experiences with what worked well and what 
might be improved, as well as their general 
impressions and feelings about the delivery of 
their program. As a part of their program, each 
team developed assessment mechanisms to de- 
termine their success in implementing their 
learning goals and objectives. In addition to 
student achievement, student and faculty atti- 
tudes were assessed to determine the relevancy 
and effectiveness of the instruction. The five 
teams then conducted a collective analysis of 
the educational interventions to identify pat- 
terns and commonalities across experiences in 
developing DIL programs, as well as account 
for any significant differences. Finally, the 
teams wrote detailed reports on their programs 
and educational approaches. Each account was 
analyzed and recommendations were made for 
future iterations of their program. ‘The les- 
sons learned were built into a guide for other 
practicing librarians presented in this book in 


Chapter 9. 


Electrical and computer 


Agricultural and biological 


Ecology/landscape architecture 


Approach 
Embedded librarianship 


Series of workshops 


6-Week mini-course 
Hybrid in-person/online course 


One-shot seminar 


The DIL project wrapped up in the fall of 
2013 with a 2-day Data Information Literacy 
Symposium held at Purdue University. The in- 
tent of the symposium was to exchange infor- 
mation and consider ways and means of build- 
ing a community of practice on DIL. At the 
symposium, each of the DIL teams presented 
their work and shared their experiences through 
presentations, discussions, and hands-on exer- 
cises. The 80-plus librarian and information 
professional participants were invited to share 
their own experiences in teaching data compe- 
tencies at their institutions through multiple 
directed discussions and activities. Chapter 11 
reports on the many areas of consideration for 
the continued development of DIL that were 
identified at the symposium and suggests pos- 
sible avenues for moving forward. 


CONCLUSION 


Our overarching goals with implementing the 
DIL project were to gain a better understand- 
ing of how librarians could develop educational 
programs on data management and curation 
topics and then to articulate directions for the 


academic library community to act on the op- 
portunities presented in this area. We developed 
the overarching methodology and approach 
outlined in this chapter for this purpose. How- 
ever, we found that the five DIL project teams 
diverged from each other in content and ap- 
proach to develop a high-quality DIL program 
for their project partner. The second section 
of this book describes the work of each of the 
DIL project teams. The third section articulates 
what we learned collectively from our experi- 
ences and charts a course to further developing 
the 12 DIL competencies and toward forming 
a community of practice on DIL. 


NOTE 


Portions of this chapter are reprinted from 
Carlson, J., Johnston, L., Westra, B., & Nich- 
ols, M. (2013). Developing an approach for 
data management education: A report from 
the Data Information Literacy project. Interna- 
tional Journal of Digital Curation, 8(1), 204- 
217. http://dx.doi.org/10.2218/ijdc.v8il.254 
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INTRODUCTION 


This chapter delves into the results of the user 
needs assessments we conducted for the Data 
Information Literacy (DIL) project and intro- 
duces the instructional interventions we devel- 
oped to address those needs. Between March 
2012 and June 2012, the five DIL project 
teams collectively interviewed 25 researchers (8 
faculty and 17 graduate students or postdocs) 
on their DIL (instrument available at http:// 
dx.doi.org/10.5703/1288284315510). We be- 
gin this chapter by presenting the broad themes 
that were uncovered across the interviews from 
our analysis. We then turn our attention to the 
responses given to each of the 12 DIL com- 
petencies by the faculty and students that we 
interviewed. 


RESULTS OF THE 
DATA INFORMATION 
LITERACY INTERVIEWS 


The results of the five case studies (presented 
in Chapters 4 through 8) revealed similarities 
and differences between faculty and students in 
how they perceived the importance of the DIL 
competencies for graduate students. Due to the 
small sample size and the use of convenience 
sampling, these results cannot be generalized 
outside of these case studies as indicators of 
each disciplines’ importance ranking. Never- 
theless the findings offer a useful starting point 
for larger investigations into the current envi- 
ronment of the educational needs of graduate 
students. 

The DIL competency ratings based on a 
5-point Likert scale are displayed in Figure 3.1. 
‘They show that, on average, participants valued 
each competency as either “important,” “very 


important,” or “essential.” However, there was 
considerable variance in the responses received 
as indicated by the high standard deviations 
(ranging from .75 to 1.02). The competen- 
cies that pertained more directly to keeping 
a research lab operational and to publishing 
outputs, such as data processing and analysis, 
data visualization and representation, and data 
management and organization, tended to be 
rated more important than competencies that 
are less central to these activities, such as dis- 
covery and acquisition and data preservation. Al- 
though deemed important, some of the lower 
rated competencies, such as data preservation, 
are difficult to address. In the interviews, many 
faculty stated that they lacked the experience or 
knowledge to educate students effectively about 
these competencies. Several of the faculty and 
students questioned whether their field had a 
culture of practice in managing, handling, or 
curating data. 

Figure 3.1 also shows the differences in how 
the participants viewed some of the competen- 
cies. Faculty generally placed a higher value on 
student development of competencies in ac- 
tively working with data (e.g., data processing 
and analysis, data visualization and representa- 
tion) and in competencies that would sustain 
the value of the data over time (e.g., metadata 
and data description, data quality and documen- 
tation) than the students did. Students gave the 
discovery and acquisition of data competency 
a higher rating than did the faculty. Students 
indicated in the interviews that this was an im- 
portant component of learning their field and 
contextualizing their research. Two of the fac- 
ulty who worked with code as their data gave 
data management and organization a lower rat- 
ing than did the other participating faculty. 
One faculty member believed that, individu- 
ally, students should know how to manage their 
own data but did not necessarily need to know 
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Figure 3.1 Graphical comparison of faculty and student ratings of importance of DIL 
competencies. Scale: 5 = essential; 4 = very important; 3 = important; 2 = somewhat 


important; 1 = not important. 


how to develop systems or management plans 
for larger units. The other found it difficult to 
respond, not knowing what constituted good 
management practice and therefore unable to 
say if it would be worth the investment of his 
and the students’ time. 


THEMES FROM THE 
DATA INFORMATION 
LITERACY INTERVIEWS 


Analyzing the interview transcripts revealed 
several commonalities across the five case stud- 
ies: the lack of formal training in data manage- 
ment, the absence of formal policies governing 
lab data, self-directed learning through trial and 
error, and a focus on mechanics over concepts. 
None of the five research groups pro- 
vided their students formal training in data 


management. Instead, faculty reported that they 
expected that their students had acquired most 
of these and other competencies prior to join- 
ing their lab. As a University of Oregon faculty 
member noted, “[students may have] picked up 
[their skills] at on-the-job training, because a 
lot of them had a former life in a professional 
field . . . or [it’s] something they got as an un- 
dergraduate.” In contrast, student interviews re- 
vealed wide variations in their prior experiences 
with data. Most of the students had attended a 
seminar on responsible conduct of research (re- 
search ethics) but reported that data practices 
were not covered in the seminar. Moreover, 
these students could not recall the specifics of 
what was stated about data practices. It should 
be noted here that none of the five case stud- 
ies involved data that would require training on 
dealing with human subjects or sensitive data. 
In lieu of formal training, most graduate 
students learned data management through 
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trial and error, reading manuals, asking their 
peers for help, or searching the Internet. Of 
the five labs participating in this project, only 
one had written policies for the treatment and 
handling of data. Respondents predominantly 
expressed disciplinary norms and processes for 
data management as underlying expectations 
that tended to be delivered informally and ver- 
bally. Some of the students interviewed had in- 
herited data from previous students or others 
in the lab; this transference process also tended 
to be informal with minimal introduction to 
the data. 

Faculty expected their graduate students to 
be independent learners. For example, one fac- 
ulty member summed up the skills acquisition 
process as the “pain and suffering method,” 
which she described as “[graduate students] 
try it, they fail, they see what failed, they come 
back to their advisor and you say, ‘Ah, well 
maybe you should try X. It is not something 
that we have attempted to teach, certainly.” 

When asked how well their students had 
mastered the DIL competencies, faculty stated 
that students tended to focus more on the 
mechanics of working with or analyzing data 
rather than the theories and assumptions un- 
derlying the software or tools they used. For 
this reason, some of the faculty expressed con- 
cern that students’ understanding of these com- 
petencies may be somewhat superficial. For in- 
stance, one faculty member stated that students 
may be able to collect data from a sensor, but 
they did not necessarily understand the equip- 
ment variables that might impact data quality 
or accuracy. They may be more focused on get- 
ting the data than on understanding the steps 
and settings that created it. Similarly, some 
faculty felt that though students may be able 
to use tools to work with data, they did not 
always use them very effectively or efficiently. 
For example, one faculty member commented, 


“I certainly think that they learn basic visual- 
ization tools, but there’s a difference between 
learning how to draw a histogram and how to 
draw a histogram that’s informative and easy 
to read.” 

This differentiation between basic project- 
driven skills and deeper, transferable under- 
standing is found in questions about manag- 
ing and curating data. Most students described 
idiosyncratic methods of data management, 
and generally overestimated the capacity of 
their methods to support local collaboration. 
Only 3 of the 7 faculty interviewed felt that 
their students provided enough information 
about their data for the faculty member to un- 
derstand it. Only one faculty member thought 
that students provided enough information for 
a researcher outside of the lab to understand 
and use the data. In contrast, 15 of the 17 stu- 
dents believed that they provided sufficient 
information for someone outside of the lab to 
understand and use the data. 

Faculty wanted their students to acquire a 
richer understanding and appreciation for good 
data management practices, but there were sev- 
eral barriers that restricted faculty from taking 
action. First, spending time on data manage- 
ment was not a priority if it distracted from or 
delayed the research process. Faced with this 
pressure, faculty accepted that a minimal skill 
set was sufficient for their students to succeed 
in school. One faculty member stated, “[Stu- 
dents] can do their work without understand- 
ing this. It’s not essential that they have this. It’s 
best if they do, but they don’t. I guess I could 
be doing more, but we don't talk about all of 
these functions. . . . Pm not sure they all under- 
stand why data has to be curated.” 

Second, faculty did not see themselves as 
having the knowledge or resources to impart 
these skills to their students themselves. One 
faculty member mentioned requirements by 
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funding agencies for data management plans 
and journals accepting supplemental data files 
as positive steps, but researchers in her field 
were ill-prepared to respond. Most of the fac- 
ulty stated that there were no best practices 
in data management in their particular field. 
Faculty in this study did not believe that fund- 
ing agencies, publishers, or scholarly societies 
in their discipline provide the guidance or re- 
sources to support effective practices in manag- 
ing, sharing, or curating data. In the absence 
of such support, the data practices in their labs 
remain centered on local needs. 

It is interesting to note similarities between 
our findings and the findings of others who 
have studied faculty perceptions of student 
competencies in information literacy. Shelley 
Gullikson (2006) surveyed faculty at institu- 
tions in eastern Canada to understand their 
perceptions of the ACRL Information Literacy 
Competency Standards. Her results indicated 
a consensus that information literacy compe- 
tencies were important overall, but little agree- 
ment on when they should be taught. Claire 
McGuiness (2006) conducted semi-structured 
interviews with sociology and civil engineering 
faculty in the Republic of Ireland and found 
that faculty believed that students were acquir- 
ing information literacy competencies without 
formal or direct instruction but through other 
existing learning situations and course work. 
More recently, Sharon Weiner (2014) sur- 
veyed faculty at Purdue University to develop 
an understanding of to what extent informa- 
tion literacy concepts were taught by faculty 
across the disciplines. In addition to revealing 
significant differences between what aspects of 
information literacy were taught between the 
schools, faculty responses indicated that they 
expected their students to know how to avoid 
plagiarism, search for information, and define a 
research topic before enrolling in their courses. 


FINDINGS ON EACH OF 
THE 12 DATA INFORMATION 
LITERACY COMPETENCIES 


The rest of this chapter will discuss findings 
on the 12 DIL competencies across the inter- 
views conducted by the five DIL project teams. 
Subsequent chapters describe the more specific 
findings by each project team and how the 
teams translated these findings into educational 
programs. Each of the competencies presented 
here includes the loosely worded skills descrip- 
tion that was provided to the interviewees to 
ground the discussion, as well as any additional 
skills that they themselves articulated. Next, 
we summarize a curated list of responses from 


both faculty and students. 
Cultures of Practice 


Table 3.1 summarizes the results of our inter- 
viewee responses regarding the cultures of prac- 
tice competency. 


Faculty Responses 

A major concern of faculty was the amount of 
prior training graduate students received with 
respect to cultures of practice for data. One fac- 
ulty member described students’ knowledge in 
this area as “underwhelming.” Faculty felt that 
though students adequately saved their files and 
made backup copies, they were not as compe- 
tent with sharing, curating, and preserving data. 
On the other hand, several faculty members 
commented that they themselves were unaware 
of any established practices, values, or norms for 
a data “culture of practice” in their discipline. 
For example, a computer science faculty mem- 
ber pointed out that knowing how to document 
research properly, and being able to go back to 
it in the future, is a discipline-wide issue. 
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TABLE 3.1 


Cultures of Practice 


Faculty and Student DIL Competency Ratings of Importance: 


Competency-related 
skills: 


Recognizes the practices, values, and norms of chosen field, discipline, or 
subdiscipline as they relate to managing, sharing, curating, and preserving data 


Recognizes relevant data standards of field (e.g., metadata, quality, formatting) 
and understands how these standards are applied 


Additional skills: 
standards 


Faculty and student 
ratings:* 


Faculty average = 3.71 
Student average = 3.88 


Identifies standard protocols in the lab that may or may not match discipline-wide 


*Ratings based on a 5-point Likert scale: 5 = essential; 4 = very important; 3 = important; 2 = somewhat important; 


1 = not important. 


Overall, faculty believed that guidance in 
this area would be beneficial. While it’s true that 
faculty recognized the importance of obtaining 
skills through experience or peer teaching, they 
would like to have formal training available so 
that established practices and norms might be 
followed in the lab and the discipline. One par- 
ticipant described an ideal course for learning 
cultures of practice in the discipline that would 
include attitudes, shared skills (e.g., scripting 
language), visualization techniques, and tech- 
nical writing training for describing results ac- 
cording to cultural norms. 


Student Responses 

The students we interviewed were unaware of 
any standards or discipline-wide norms for or- 
ganizing, documenting, and sharing data. Yet, 
they recognized that this would be useful and 
important. One student stated that if research- 
ers did not adhere to the standards of their 
field, “the results will not mean as much.” And 
several students mentioned that they would fol- 
low standards if such standards exist. One com- 
puter science student mentioned that metadata 
standards in academia and industry appear to 
be at odds, with a greater amount of metadata 
being required in industry. As many graduate 


students take positions outside of academia af- 
ter graduation, developing an understanding of 
industry norms and expectations in working 
with data is a critical element of effective edu- 
cational programs. 


Data Conversion and Interoperability 


Table 3.2 summarizes the results of our inter- 
viewee responses regarding the data conversion 
and interoperability competency. 


Faculty Responses 
Most faculty reported that competencies with 
data conversion and interoperability were gen- 
erally underdeveloped in students. Faculty 
reported that their students acquired their 
knowledge and skills in this competency 
through classes, peers, and experience. One 
faculty member stated that his students needed 
more experience with how conversion can af- 
fect their data. Another mentioned that stu- 
dents need to be aware of issues surrounding 
data loss during data migration and have an 
understanding of appropriate open standards 
for file formats. 

Potential data loss in the conversion process 
was mentioned repeatedly. Faculty reported 
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TABLE 3.2 Faculty and Student DIL Competency Ratings of Importance: 


Data Conversion and Interoperability 


Competency-related 
skills: 


changing data formats 


ls proficient in migrating data from one format to another 
Understands the risks and potential loss or corruption of information caused by 


Understands the benefits of making data available in standard formats to facilitate 


downstream use 


Additional skills: 
Ability to code 


Faculty and student 
ratings:* 


Faculty average = 4.13 
Student average = 4.24 


Understands the advantages of different file formats 


*Ratings based on a 5-point Likert scale: 5 = essential; 4 = very important; 3 = important; 2 = somewhat important; 


1 = not important. 


that students were not considering the po- 
tential for loss or corruption when converting 
their data files. One faculty member made a 
connection between understanding how data 
can be manipulated and ensuring the quality of 
the data. Another saw this as an important skill 
for students to develop not just for working in 
his lab but also for gaining employment after 
graduation. 


Student Responses 

Nearly all of the students (14 out of 17) re- 
ported converting data as a part of their work 
in the lab, though most did not mention con- 
version as a distinct stage of the data life cycle. 
Students responded to questions of data con- 
version and interoperability by discussing con- 
version techniques for raw data (i.e., Microsoft 
Access files to plain text files; proprietary sensor 
data to Microsoft Excel) as well as processed 
data (i.e., converting images created in gnuplot 
to GIF or JPEG; converting a figure to a table). 
Conversions ranged from a simple cut-and- 
paste transportation of data to identifying the 
meaningful elements of the data and extracting 
them into a usable format. Students were less 
concerned with data loss during the conversion 
process than faculty. A few students reported 


checking the data after converting them to en- 
sure that data loss had not occurred. 


Data Curation and Reuse 


Table 3.3 summarizes the results of our inter- 
viewee responses regarding the data curation 
and reuse competency. 


Faculty Responses 

Faculty viewed data curation and reuse as an 
important subject, but commented that both 
students and the researchers themselves lacked 
these skills. In fact, several commented that the 
idea of data reuse is just beginning to take hold. 
One faculty member commented that the en- 
tire research lab needed a better understand- 
ing of who would benefit from data curation. 
Another felt that students generally don’t have 
to concern themselves with these skills as the 
researcher decides when and how to make the 
data available for reuse. 

Faculty also had a more personal reason for 
believing data curation and reuse to be im- 
portant. In their experience, their data could 
not be recreated over the course of extended 
experiments and consequently must be cu- 
rated. Therefore they were the number one 
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TABLE 3.3 Faculty and Student DIL Competency Ratings of Importance: 


Data Curation and Reuse 


Competency-related 
skills: 


Recognizes that data may have value beyond the original purpose, to validate 
research, or for use by othersls able to distinguish which elements of a data set 


are likely to have future value for self and for others 

Understands that curating data is a complex, often costly endeavor that is 
nonetheless vital to community-driven e-research 

Recognizes that data must be prepared for its eventual curation at its creation and 


throughout its life cycle 


Articulates the planning and activities needed to enable data curation, both 
generally and within local practice 
Understands how to cite data as well as how to make data citable 


Additional skills: 


Faculty and student 
ratings:* 


None 


Faculty average = 4.25 
Student average = 4.06 


*Ratings based on a 5-point Likert scale: 5 = essential; 4 = very important; 3 = important; 2 = somewhat important; 


1 = not important. 


reuse consumers of their own data. Similarly, 
faculty commented that the academic culture 
places less emphasis on functionality of data for 
public use and rather focuses more on the re- 
searchers’ needs. Not all data are viable for cu- 
ration, however, as one faculty member noted; 
nonstandard code was not reusable and didn’t 
promote future research. 

Faculty were also asked whether they or their 
graduate students had ever deposited data into 
a data repository. Of the eight faculty inter- 
viewed, three had deposited data in a reposi- 
tory, three had not, and two did not answer the 
question. Those that had, deposited their code 
into SourceForge or Google Code. However, 
faculty reported that getting the software in a 
format in which it could be shared was difficult. 


Student Responses 

Students identified at which stages their data 
(raw vs. processed vs. published) would be 
most valuable to save, but the potential value 
for reuse in the data they created was not an 
immediate concern. Rather, students did not 


appear to understand the practices and skills 
that would be needed to support the reuse of 
their digital information. For example, one stu- 
dent believed that individuals in the lab were 
taking the necessary steps to prepare the gener- 
ated data for eventual reuse, but was unsure of 
“exactly what they’re doing.” 

Of the 18 students interviewed, 7 indicated 
that they had deposited data into a repository 
for reuse, though some of them indicated that 
these repositories were for a particular agency 
and not publicly accessible. Students were al- 
most evenly split about their intent to deposit 
data into a repository in the future, with 7 in- 
dicating that they were planning to do so and 
6 stating that they were not. Four students re- 
sponded “I dont know” to the question. Al- 
most all of the students we interviewed were 
willing to share their data with someone outside 
of their lab, with only one student responding 
“no” and one other stating “I don’t know.” Sev- 
eral students said they would need their advi- 
sors approval before sharing their data. How- 
ever, 12 of the 15 students who indicated they 
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TABLE 3.4 Faculty and Student DIL Competency Ratings of Importance: 


Data Management and Organization 


Competency-related 
skills: 


Understands the life cycle of data, develops data management plans, and keeps 
track of the relation of subsets or processed data to the original data sets 


Creates standard operating procedures for data management and documentation 


Additional skills: 


Familiarity with tools for data management 


Ability to annotate data sets at a higher level to keep track of changes and 


analyses performed 


Faculty and student 
ratings:* 


Faculty average = 4.00 
Student average = 4.47 


*Ratings based on a 5-point Likert scale: 5 = essential; 4 = very important; 3 = important; 2 = somewhat important; 


1 = not important. 


would share their data also stated that they 
would place conditions on sharing the data. 
The other 3 students responded “I don’t know.” 
The most common condition was that the stu- 
dent or the lab receives proper credit through 
a citation if the data were used in a publica- 
tion. Other conditions mentioned were no re- 
distribution of the data before publication of 
the findings of the lab of origin, and assurance 
that the data would not be misinterpreted by 
the recipient. 


Data Management and Organization 


Table 3.4 summarizes the results of our inter- 
viewee responses regarding the data manage- 
ment and organization competency. 


Faculty Responses 

Faculty described data management skills 
as standard operating procedures passed on 
from one student to the next. They believed 
that students gain rudimentary skills in data 
management in statistics courses prior to 
their graduate school career. “Learning by 
doing” was cited by many faculty as how stu- 
dents obtained these skills. If students were 
not proficient in this area, several problems 


arose, including code overwrites, haphazard 
organization, and the inability to locate spe- 
cific data. Faculty also cited participation in 
internships as a way that students obtained 
proficiency. 

Data management plans ranked as very im- 
portant; however, faculty clarified that students 
should able to follow them rather than develop 
and create them. When it came to the life cycle 
of data, faculty had different perspectives. One 
believed that students did not necessarily have 
to understand the life cycle to manage the data. 
Another cited the data life cycle as the reason 
students lacked skills: they did not see the full 
picture of why data management and organiza- 
tion becomes important further in the data life 
cycle. Another faculty member maintained that 
it was important for students to understand the 
entire process so that they can backtrack if a 
mistake is made. 


Student Responses 

Students rated data management and organiza- 
tion skills as the highest competency in terms of 
importance. In general, the students described 
the processes of data management and not 
necessarily the reasons behind it. For exam- 
ple, most students kept copies of their data in 


60 PARTI Making the Case for Data Information Literacy 


TABLE 3.5 Faculty and Student DIL Competency Ratings of Importance: 


Data Preservation 


Competency-related 
skills: 


preserving data 


Recognizes the benefits and costs of data preservation 
Understands the technology, resources, and organizational components of 


Utilizes best practices in preparing data for its eventual preservation during its 


active life cycle 


Articulates the potential long-term value of own data for self or others and is able 
to determine an appropriate preservation time frame 

Understands the need to develop preservation policies and is able to identify the 
core elements of such policies 


Additional skills: 


Faculty and student 
ratings:* 


None 


Faculty average = 3.57 
Student average = 3.75 


*Ratings based on a 5-point Likert scale: 5 = essential; 4 = very important; 3 = important; 2 = somewhat important; 


1 = not important. 


multiple locations, but the ad hoc methods of 
saving created confusion rather than security. 
Almost all students stated that they learned 
data management skills through trial and error. 
They learned through word-of-mouth about 
standards for managing and organizing their 
data, if they existed at all. Of the 15 students, 
9 mentioned that there were no formal poli- 
cies or that they did not know of any in place 
for managing the data in their lab (2 students 
did not respond to the question). Even those 
students working in labs with policies were un- 
aware of formal standards in the discipline. The 
students recognized organization of data as an 
issue recognized for day-to-day tasks. For ex- 
ample, it was difficult for one student to locate 
particular files. That student reported occasion- 
ally needing to go back and rerun coding to 
find the authoritative version. 


Data Preservation 
Table 3.5 summarizes the results of our inter- 


viewee responses regarding the data preserva- 
tion competency. 


Faculty Responses 

Depending on context, data preservation was 
considered either “essential” or not a ma- 
jor concern for faculty. Faculty whose work 
included sustainability of results over time 
tended to view preservation of their data as a 
priority. Other faculty saw the importance of 
preservation in theory, but did not necessarily 
see the need to take action to preserve their 
data. Faculty noted a lack of student knowl- 
edge or interest in this area. One faculty mem- 
ber mentioned a need for more resources to tell 
students about current best practices. Some 
faculty reported that they themselves did not 
have strong knowledge in this area. One rated 
this competency as both “important” and “I 
don’t know,” as he felt he did not fully under- 
stand data preservation. Another faculty mem- 
ber reported that since technology changed so 
quickly, some of the data would become obso- 
lete quickly. 


Student Responses 
Many of the students were unsure of a long- 
term use for their data. Students gave a range 
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of responses when asked how long their data set 
should be preserved (see Table 3.6). 

The length of preservation of data differed 
among the labs. For example, the students in 
the natural resources lab recognized the unique 
quality of their research and their role in sup- 
porting long-term research, and answered “in- 
definitely” to the question. Students in the ag- 
ricultural and biological engineering lab were 
generally less certain of the long-term value of 
the data. Four of the five students responded 
either “less than 3 years” or “I dont know” 
to the question. There was some uncertainty 
about what was being done to preserve the data 
in the civil engineering lab. Two students indi- 
cated that no steps were being taken to preserve 
the data, one indicated that steps were being 
taken, and one did not know. Overall, students 
believed that the principal investigator, others 
in the lab, or a data repository handled data 
preservation. 


Data Processing and Analysis 


Table 3.7 summarizes the results of our inter- 
viewee responses regarding the data processing 
and analysis competency. 


Faculty Responses 
Data processing and analysis is considered a di- 
rect component of conducting science in most 
disciplines; therefore it received the highest rat- 
ing of importance by faculty. Overall, respon- 
dents viewed this competency as critical for 
students to avoid mistakes in evaluating data 
and to gain efficiency in their work. Several fac- 
ulty mentioned that students were unfamiliar 
with processing and analysis tools in the lab as 
well as within their discipline. 

Faculty estimated that their students’ skill lev- 
els in this competency ranged from “not system- 
atic” and “inefficient” to “highly experienced” 


TABLE 3.6 How Long Should Your 
Data Set Be Preserved? (n = 17) 


Student Response nia 
| don’t know A 
Less than 3 years 2 
10-20 years 2 
20-50 years 3 
50-100 years 1 
For the life of the bridge l 
being studied 
Indefinitely j 


upon entering the program. One faculty mem- 
ber described students as good in this area, but 
not necessarily efficient, meaning that it took 
students longer than it should to perform tasks. 
Potential resources for graduate students in- 
cluded workshops and classes, but peer-to-peer 
learning was noted as most influential. Another 
faculty member responded that he did not typi- 
cally teach these skills because students absorbed 
the material better by engaging with it them- 
selves—even though they may fail repeatedly. 

As with many of the competencies, the na- 
ture of training depends on local and disciplin- 
ary practices and culture. There was an em- 
phasis on developing processing and analysis 
skills and critical thinking through personal 
engagement with the data and tools. Some of 
the pathways to skill acquisition mentioned 
were peer-to-peer and advisor contacts; formal 
courses, such as statistics; and self-teaching/ 
trial and error. 


Student Responses 

As with faculty, students recognized that these 
skills were generally at the core of scientific 
practice in their domains. One student from 
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TABLE 3.7 Faculty and Student DIL Competency Ratings of Importance: 


Data Processing and Analysis 


Competency-related 
skills: 


discipline or research area 


Familiar with the basic data processing and analysis tools and techniques of the 


Understands the effect that these tools may have on the data 
Uses appropriate workflow management tools to automate repetitive analysis of 


data 
Additional skills: 


Faculty and student 
ratings:* 


None 


Faculty average = 4.63 
Student average = 4.35 


*Ratings based on a 5-point Likert scale: 5 = essential; 4 = very important; 3 = important; 2 = somewhat important; 


1 = not important. 


the ecology lab commented: “One of the—I 
think—biggest mistakes that people make in 
our field is improperly analyzing data.” Stu- 
dents indicated that they were asked to perform 
a wide variety of tasks in processing and ana- 
lyzing data. Several students reported teaching 
themselves to use tools to perform these tasks. 
Statistical programs dominated the list of tools 
that students described (R, SPSS, SAS), as did 
Microsoft Excel. In addition, they described 
a variety of other programs and tools for col- 
lecting and transforming data specific to the 
particular research domain and project, includ- 
ing ArcGIS, data loggers, ENVI for analyzing 
Landsat images, MATLAB, and various coding 
languages such as Python and C++. 


Data Quality and Documentation 


Table 3.8 summarizes the results of our inter- 
viewee responses regarding the data quality and 
documentation competency. 


Faculty Responses 

Many faculty felt that their students knew to 
check for any discrepancies in their data to re- 
solve issues before analysis; however, faculty did 
not express much confidence in their students’ 


abilities to do the job well, nor to document 
the steps taken. One interviewee commented 
that it was “very hard to motivate students to 
write documentation,” mostly because the stu- 
dents’ focus was not on reproducibility, but on 
getting the work done and graduating. Faculty 
described self-documentation of code (a log of 
commands used and the parameters) as being 
important so that students could reproduce 
results. Another faculty member cited that a 
lack of tools for automating the process was 
a real challenge. This interviewee also noted 
that students consistently found themselves 
more concerned with the outputs of an experi- 
ment rather than the steps taken to get to the 
outputs. Still another faculty interviewee was 
confident that students were learning the skills 
needed to write the methods section of a paper, 
but that there was not enough documentation 
concerning the research process itself. This in- 
terviewee felt that students were overconfident 
when it came to artifacts and corruptions, 
and that they generally thought that their 
data was in good shape. One of the labs used 
error-checking procedures to ensure that mea- 
surements fell within known boundaries. The 
students in this lab participated in basic data 
quality checks, which included steps to ensure 
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TABLE 3.8 Faculty and Student DIL Competency Ratings of Importance: 


Data Quality and Documentation 


Competency-related 
skills: 


corruption of data 


Recognizes, documents, and resolves any apparent artifacts, incompletion, or 


Utilizes metadata to facilitate an understanding of potential problems with data 


sets 


Documents data sufficiently to enable reproduction of research results and data 


by others 


Tracks data provenance and clearly delineates and denotes versions of a data set 


Additional skills: 


Faculty and student 
ratings:* 


None 


Faculty average = 4.63 
Student average = 4.12 


*Ratings based on a 5-point Likert scale: 5 = essential; 4 = very important; 3 = important; 2 = somewhat important; 


1 = not important. 


that measurements were not out-of-bounds. 
Five out of the seven faculty we interviewed 
reported using some kind of version control 
practices in the lab, whether a specific system 
such as Subversion (SVN) or SharePoint, or file 
naming practices that included the version. 


Student Responses 

Overall, the students were aware of and/or par- 
ticipated in quality control steps. Out of 16 
students, 14 felt that they created a sufficient 
amount of documentation for someone with 
similar expertise to understand and use their 
data (1 student did not provide a response). 
However, this may reflect one faculty member’s 
assertion that students were overconfident in 
this area. Students in the computer engineer- 
ing program were aware that this is an area that 
could benefit from “drastic improvement” (in 
the words of 1 student), but they also reported 
that their faculty advisor stressed documenta- 
tion of the steps taken during research. For 
them, logging of calculations, thoughts, and 
the entire research process began early. These 
students were also more likely to use version- 
ing software; students in ecology and natural 
resources were more likely to use file naming 


strategies for versioning. They learned these 
skills through trial and error, from peers, and 
from the principal investigator. All 16 of the 
students who provided a response planned to 
leave a copy of their data with their advisor af- 
ter they graduate. 


Data Visualization and Representation 


Table 3.9 summarizes the results of our inter- 
viewee responses regarding the data visualiza- 
tion and representation competency. 


Faculty Responses 

Faculty saw data visualization and representation 
as a critical competency for students to master. 
They identified a need for more advanced in- 
struction for students to learn how to create ef- 
fective, and ethical, graphical representations of 
data. Several of the faculty reported that students 
learned the mechanical aspects of using visual- 
ization tools, but were not as skilled in knowing 
what makes a good visualization. As one faculty 
member stated, “visualization is communica- 
tion.” Students also struggled in making use of 
representations to evaluate the quality of their 
data or to “impact a specific decision.” 
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TABLE 3.9 Faculty and Student DIL Competency Ratings of Importance: 


Data Visualization and Representation 


Competency-related 
skills: 


charts, and diagrams 


Proficiently uses basic visualization tools of discipline 
Avoids misleading or ambiguous representations when presenting data in tables, 


Chooses the appropriate type of visualization, such as maps, graphs, animations, 
or videos, on the basis of an understanding of the reason/purpose for 
visualizing or displaying data 


Additional skills: 


Faculty and student 
ratings:* 


Faculty average = 4.63 
Student average = 4.35 


Understands the mechanics of specific data visualization software programs 


*Ratings based on a 5-point Likert scale: 5 = essential; 4 = very important; 3 = important; 2 = somewhat important; 


1 = not important. 


Faculty reported that students received little 
to no formal training in this area as graduate 
students. Instead, students used the skills they 
acquired from undergraduate course work with 
their intuition to create visualizations and rep- 
resentations of their data. There were some ex- 
ceptions. One faculty member recommended 
a book on the topic to incoming students. An- 
other faculty member taught advanced tech- 
niques in the lab. 


Student Responses 

Student responses indicated a general recogni- 
tion of the importance of data visualization to 
convey their findings in publications and other 
venues. All 17 of the students we interviewed 
indicated that they generated visual represen- 
tations of their data. Several students men- 
tioned the need to connect their work to their 
intended audiences. One student mentioned 
that “it’s pretty much impossible to interpret 
the data without turning it into something.” 
Students reported informal training on data 
visualization—advisors, lab mates/peers, and 
online help were resources for learning. Stu- 
dents mentioned a desire for software-specific 
instruction for creating their data visualizations 


in R, MATLAB, Python, GMT, ArcGIS, Excel, 
SPSS, GIMP, and SigmaPlot. 


Databases and Data Formats 


Table 3.10 summarizes the results of our inter- 
viewee responses regarding the databases and 
data formats competency. 


Faculty Responses 

Faculty stated that students needed compe- 
tency with databases and data formats but that 
their abilities were generally underdeveloped. 
Faculty gravitated to the “databases” elements 
of this competency rather than the more gen- 
eral “data formats” aspects. This may be due to 
the order in which we presented our informa- 
tion; however, it can also be inferred that not 
every faculty member interviewed employed 
databases in his or her work. Not surprisingly, 
those who did tended to give a higher overall 
rating of importance to this competency than 
those who did not. 

Of the faculty who discussed databases, 
most mentioned understanding how to query 
databases as an important skill for students. 
Any faculty thoughts and concerns about 
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TABLE 3.10 Faculty and Student DIL Competency Ratings of Importance: 
Databases and Data Formats 


Competency-related 
skills: 


databases 


Understands the concept of relational databases and how to query those 


Becomes familiar with standard data formats and types for discipline 
Understands which formats and data types are appropriate for different research 


questions 
Additional skills: 


Faculty and student 
ratings:* 


Faculty average = 3.71 
Student average = 3.88 


Understands how to maximize performance of databases based on own design 


*Ratings based on a 5-point Likert scale: 5 = essential; 4 = very important; 3 = important; 2 = somewhat important; 


1 = not important. 


databases were generally shaped by the way 
that they themselves made use of them in their 
labs. For example, the natural resources faculty 
member commented that without the use of 
databases, it’s as if his data does not exist. In 
contrast, the agricultural and biological engi- 
neering faculty member was striving to incor- 
porate all of the lab’s data sets into a database 
and noted that both he and his students needed 
to spend more time learning about the capa- 
bilities of databases. Some fields offer courses 
in databases, and faculty expect that students 
take these courses and to know how to work 
with databases prior to joining the lab. The fac- 
ulty we interviewed from fields in which such 
courses are not offered speculated that students 
acquired skills by working with others, rather 
than through formal classroom experience. 


Student Responses 

Students handled a variety of data formats in 
their respective labs. The vast majority of stu- 
dents used Microsoft Excel or .csv files, as well 
as ASCII text file formats. Other data formats 
mentioned were Microsoft Access databases, 
MATLAB files, images (TIFF and JPEG), ras- 
ter data, SPSS files, SigmaPlot, and NetCDF, 


as well as the programming languages C and 
C++. Students tended not to focus on the data 
formats in the interviews. Therefore, they did 
not discuss larger issues in formatting data and 
databases in depth. 


Discovery and Acquisition of Data 


Table 3.11 summarizes the results of our in- 
terviewee responses regarding the discovery and 
acquisition of data competency. 


Faculty Responses 

Overall, faculty rated discovery and acquisi- 
tion of data lowest of the 12 competencies. 
The assignment of importance to these skills 
seemed to align to the degree to which the 
individual and team used external data for 
research. Two of the faculty we interviewed 
indicated that the data they used were gener- 
ated entirely in their labs, and they assigned 
a lower rating to this competency. Others in- 
dicated that external data might be brought 
into the lab to compare with or augment the 
data they generated. Or they might support 
an analysis done in the lab. Faculty used ex- 
ternal data from sources such as the Census 
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TABLE 3.11 


Faculty and Student DIL Competency Ratings of Importance: 


Discovery and Acquisition of Data 


Competency-related 
skills: 


Locates and utilizes disciplinary data repositories 
Evaluates the quality of the data available from external sources 


Not only identifies appropriate external data sources, but also imports data and 
converts it when necessary so it can be used locally 


Additional skills: 


external sources 


Faculty and student 
ratings:* 


Faculty average = 3.57 
Student average = 4.12 


Understands and navigates data use agreements for reuse of data sets from 


*Ratings based on a 5-point Likert scale: 5 = essential; 4 = very important; 3 = important; 2 = somewhat important; 


1 = not important. 


Bureau, SourceForge, and repositories of geo- 
spatial data. 

Faculty thought that student skills were highly 
variable in this competency. They believed that 
students acquired skills through trial and error 
and consultations with advisors and peers. No 
dominant theme emerged across faculty re- 
sponses, but some valued the ability to evaluate 
data quality and have an “appropriate level of 
skepticism of outside data sources.” Some faculty 
thought that locating and using data sources, if 
necessary, was an easily acquired skill. 


Student Responses 

This competency was highly rated overall by 
students despite a lack of experience for some 
with locating and using data from external 
sources. Students reported that their skills 
were developed primarily from consultations 
with peers and advisors. Students’ experi- 
ences in acquiring data varied. Some found 
data that had been well documented, thus 
making it easy to understand and use. Oth- 
ers noted that it was difficult to understand 
the external data they had acquired or the data 
used different measurement scales that had to 
be converted. Overall, 14 out of 17 students 


made use of data acquired outside of their lab. 
The major data repositories used by students 
were more varied than those listed by faculty. 
In addition to geospatial data repositories and 
SourceForge, students used the Environmental 
Protection Agency, the National Oceanic and 
Atmospheric Administration, Oregon State 
University’s PRISM Climate Group, and the 
U.S. Department of Agriculture’s Soil Survey 
Geographic (SSURGO) databases. 

Seven out of the 17 students inherited data 
generated from others, reporting both positive 
and negative experiences in the transition. A 
student in computer engineering mentioned 
doing literature reviews as a means of searching 
for code. 


Ethics and Attribution 


Table 3.12 summarizes the results of our inter- 
viewee responses regarding the ethics and attri- 
bution competency. 


Faculty Responses 

Few faculty commented on the “misrepresenta- 
tions of data” component of this competency, 
focusing instead on the citation, intellectual 
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TABLE 3.12 Faculty and Student DIL Competency Ratings of Importance: 


Ethics and Attribution 


Competency-related 
skills: 


administering data 


Develops an understanding of intellectual property, privacy and confidentiality 
issues, and the ethos of the discipline when it comes to sharing and 


Acknowledges data from external sources appropriately 
Avoids misleading or ambiguous representations when presenting data 


Additional skills: 


Faculty and student 
ratings:* 


Faculty average = 4.38 
Student average = 4.35 


Identifies what data not to show for privacy purposes 


*Ratings based on a 5-point Likert scale: 5 = essential; 4 = very important; 3 = important; 2 = somewhat important; 


1 = not important. 


property (IP), privacy, and confidentiality ele- 
ments. Citing data was rated as “essential” to 
“very important” but faculty stated that their 
disciplines lacked standards for citing data. 
Most felt that students were good enough at cit- 
ing data. One of the faculty members felt that 
ethics and attribution were discussed consis- 
tently in the lab and at the university and be- 
lieved that students recognized that ethics ex- 
tended beyond literature and included data sets. 
Two of the faculty felt that students cited out- 
side sources sufficiently. One of them noted that 
students may not know how to cite a data set 
versus a piece of literature, and he himself didn’t 
know of a disciplinary standard for citing data. 

Several faculty noted that graduate students 
received ethics training either at the university 
or departmental level. The majority of the fac- 
ulty noted that the question of who owned the 
data is “somewhat shaky” or “up in the air.” 
One of the faculty members we interviewed 
felt that ethics training adequately covers pri- 
vacy and IP issues, but more detailed, practical 
instruction for handling sensitive data is nec- 
essary. Another stated that students needed to 
understand the differences between copyrights, 
trademarks, and patents. 


Student Responses 

Several students reported citing the research 
paper associated with a data set rather than a 
data set itself, although many of the graduate 
students interviewed (11 out of 17 students) 
expressed a general feeling of being competent 
at citing data. It is encouraging that 11 students 
reported receiving training or instruction for 
ethics and IP issues, although they had mixed 
opinions about the usefulness of the training 
about data. Of the 17 students interviewed, only 
3 indicated that they had a good understanding 
of their university’s policies on research data, 
which echoed the faculty’s statements on the 
need for more substantive graduate education 
in this area. One of the computer science stu- 
dents mentioned that the lab sought software 
code with open GNU or PSD licenses to ensure 
that they could properly use code generated by 
others. This aligned well with the faculty asser- 
tion that it was very important that these stu- 
dents understood issues with IP and copyright, 
trademarks, and patents. Of potential concern 
was that one student asserted that she didn't 
need to cite external code that she consulted but 
never used outright. About half of the students 
interviewed were not aware of any journals that 
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TABLE 3.13 Faculty and Student DIL Competency Ratings of Importance: 


Metadata and Data Description 


Competency-related 
skills: 


Understands the rationale for metadata and proficiently annotates and describes 
data so it can be understood and used by self and others 


Develops the ability to read and interpret metadata from external disciplinary 


sources 


Understands the structure and purpose of ontologies in facilitating better sharing 


of data 
Additional skills: 


Individuals who publish research must be ready at any point to answer questions 


from others regarding the data set 


Faculty and student 
ratings:* 


Faculty average = 4.57 
Student average = 3.88 


*Ratings based on a 5-point Likert scale: 5 = essential; 4 = very important; 3 = important; 2 = somewhat important; 


1 = not important. 


might accept data sets for publication or as sup- 
plements to a journal article. 


Metadata and Data Description 


Table 3.13 summarizes the results of our inter- 
viewee responses regarding the metadata and 
data description competency. 


Faculty Responses 
Faculty described students as barely proficient 
or worse in the area of metadata and data de- 
scription, and most felt that this was an area 
that needed improvement. Nearly every faculty 
member interviewed (seven out of eight) re- 
ported that the amount of documentation and 
description that their graduate students cur- 
rently provided was not suf- 
As an artifact ficient for someone outside 
of the lab to understand and 
make use of the data. Three 


of the faculty reported that 


of the research 

process, data sets 
are reflections of 
the decisions and they themselves had some 
actions made trouble understanding and 
consciously or making use of the data be- 
cause of the lack of descrip- 


tion. One of the faculty felt 


unconsciously by 


humans. 


that this competency was of primary impor- 
tance and that much could be gained by ad- 
dressing the need; he expressed personal interest 
in learning more because he was unsure of the 
meaning of the term metadata and felt that a 
lack of knowledge in this area could be damag- 
ing. Another stated that “currently, researchers 
spend more time doing the work than explain- 
ing the work [they] are doing.” For ongoing 
projects in one of the labs in which students pass 
code to other students each semester, the faculty 
member stated that current documentation was 
“definitely” not enough for someone outside of 
the lab to understand and make use of the data. 
Faculty considered this to be a major issue dur- 
ing project transition between semesters. 


Student Responses 

Out of the 17 students interviewed, 12 were 
familiar with the concept of metadata, though 
most stated that they had not received any 
formalized training. Some actually provided 
an inaccurate definition when pressed to ex- 
plain it. (Two confused it with meta-analysis.) 
Student knowledge of metadata evolved from 
past projects, trial and error, and even past 
work in industry at least for one graduate 
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student. For example, a natural resources 
graduate student explained that her method 
for describing data had been learned through a 
“personal coping strategy,” meaning, through 
trial and error. One graduate student familiar 
with metadata noted that the metadata he cre- 
ates often is not detailed because he “doesn’t 
have enough time.” Several students reported 
no trouble understanding the metadata that 
accompanied the external data they have used. 
None of the students reported using a meta- 
data standard, although one student applied a 
standardized taxonomy. 


CONCLUSION 


Overall the DIL competencies were an effec- 
tive means of exploring the environments and 
needs of our faculty partners and their students. 
The DIL competencies were not intended to 
serve as a universally applied set of skills or as 
prescriptive standards. The DIL competencies 
will continue to evolve as we learn more about 
disciplinary and local practices. Chapter 10 
addresses future directions for developing the 
DIL competencies. 

We observed many commonalities between 
faculty and students from different fields of 
study and from different academic institutions. 
Conducting interviews informed not only our 
respective DIL programs but also our collective 
understanding of the environments in which 
research data are generated, administered, and 
utilized. As an artifact of the research process, 
data sets are reflections of the decisions and ac- 
tions made consciously or unconsciously by 
humans. Understanding the environments, 


challenges, and needs of the people who work 
with data is an integral part of developing edu- 
cational programs about data. The next section 
of this book presents the work of the five DIL 
project teams, describes the specific findings 
from their interviews, and their responses to 
the findings. These case studies illustrate how 
important the interviews were to the success of 
the DIL project. 


NOTE 


Portions of this chapter are reprinted from 
Carlson, J., Johnston, L., Westra, B., & Nich- 
ols, M. (2013). Developing an approach for 
data management education: A report from 
the Data Information Literacy project. Interna- 
tional Journal of Digital Curation, 8(1), 204- 
217. http://dx.doi.org/10.2218/ijdc.v8il.254 
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INTRODUCTION 


The Cornell University Data Information Lit- 
eracy (DIL) project team worked with a fac- 
ulty member and graduate students in natural 
resources. The faculty member's lab collects 
data on longitudinal changes in fish species 
and zooplankton—namely their occurrence, 
population abundance, growth, and diet—in 
Lake Ontario. After interviewing the faculty 
member, a former student, and a lab techni- 
cian, we determined that the DIL needs for 
this area were primarily data management and 
organization and data quality and documenta- 
tion, including metadata and data description. 
We also placed a secondary focus on databases 
and data formats, data visualization and rep- 
resentation, and cultures of practice, including 
data sharing. 

To address these needs, the Cornell team 
focused on two separate educational tracks. 
The first was a series of DIL workshops, open 
to the whole Cornell community, which was 
an introduction to data management and data 
management plans (DMPs), data organiza- 
tion, and data documentation. The second 
was a 6-week credit course on data manage- 
ment for graduate students in natural re- 
sources taught by the faculty member and the 
data librarian, Sarah J. Wright, in the spring 
of 2013. The course built on the previous 
workshop topics and also included sections on 
data quality, data sharing, data analysis, and 
visualization. 

Assessment for the workshops involved using 
post-instruction surveys. The for-credit course 
assessment included formative “l-minute pa- 
pers,” very short, anonymous exercises per- 
formed at the end of each class; instructor feed- 
back on active learning exercises (including an 


optional DMP exercise graded by a rubric—see 
Appendix A to this chapter); and a final survey 
that asked students to self-report on perceived 
skills before and after taking the class. The 
feedback was generally very positive, with the 
majority of students in the credit course indi- 
cating that they would recommend it to other 
graduate students in natural resources. They 
also reported an increase in their skill levels for 
all outcomes. 

This chapter will discuss the Cornell case 
study and our instructional approaches. The 
strengths of our program were that we 


e introduced graduate students to major 
concepts in data management; 

e built and gathered modules, exercises, 
and tools that can be used in a range of 
educational situations; 

e exposed current gaps in data manage- 
ment training; 

e allowed students to network and ex- 
change information; 

e built awareness and relationships with 


faculty. 
Ways in which we can improve are to 


e provide more hands-on exercises so that 
students can apply the skills they learn to 
their research data; 

e tailor the outcomes of the workshops and 
the course to specific skill levels and other 
disciplines; 

e build and gather more curriculum re- 
sources and activities at both low- and 
high-skill levels; 

e increase outcomes-based assessment and 
experiment with ways to make sessions 
more student-centered and peer-led. 
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LITERATURE REVIEW AND 
ENVIRONMENTAL SCAN OF DATA 
MANAGEMENT IN NATURAL 
RESOURCES AND ECOLOGY 


The faculty member who worked with our 
Cornell team has a lab that collects data on lon- 
gitudinal changes in fish species and zooplank- 
ton. This faculty member has long been an 
advocate of improving the data management 
skills of graduate students, and therefore was 
a natural partner for this project. Our faculty 
members concern with data management re- 
flected general trends in the larger field of ecol- 
ogy, which has increasingly emphasized data 
management and curation at both a macro and 
a micro level. For example, Wolkovich, Regetz, 
and O’Connor (2012) note: 


Because an ecological dataset [is] collected at 
a certain place and time [it] represents an ir- 
reproducible set of observations. Ecologists 
doing local, independent research possess 

. . a wealth of information about the natu- 
ral world and how it is changing. Although 
large-scale initiatives will increasingly enable 
and reward open science, . . . change de- 
mands action and personal commitment by 
individuals—from students and PIs [princi- 


pal investigators]. (p. 2102) 


A great deal of the literature focused on higher 
level issues, such as big data, cyberinfrastructure, 
and the development of metadata standards, or 
on an individual project as a microcosm of these 
issues. Given the heterogeneous and interdisci- 
plinary nature of ecological data and the need 
for integrative studies in areas such as climate 


change, several authors (Carr et al., 2011; Jones, 
Schildhauer, Reichman, & Bowers, 2006; Mi- 
chener & Jones, 2012; Wolkovich et al., 2012) 
addressed bioinformatics, ecoinformatics, and 
data sharing writ large, including the current 
state of the art and the need for better data man- 
agement and coordination between various areas 
of ecological research. Others (Gil, Hutchison, 
Frame, & Palanisamy, 2010; Michener, Brunt, 
Helly, Kirchner, & Stafford, 1997) explored the 
variety of metadata standards for ecological data, 
the need for structured metadata and crosswalks 
to facilitate integration and interoperability of 
heterogeneous data sets, and the existing and 
needed partnership efforts necessary to advance 
this. In other cases, the literature outlines cyber- 
infrastructure needs for long-term ecological re- 
search, including particular technical solutions 
and issues with data collection, modeling, and 
management, such as the difficulties of collect- 
ing and harvesting heterogeneous data from a 
network of sites, building cross-searchable digi- 
tal repositories, and accurately modeling with 
existing data (Barros, Laender, Goncalves, Cota, 
& Barbosa, 2007; Magnusson & Hilborn, 
2007; McKiernan, 2004). Institutions such as 
The Long Term Ecological Research Network 
(2012; Michener, Porter, Servilla, & Vanderbilt, 
2011), DataONE (n.d.a), the Knowledge Net- 
work for Biocomplexity (2005), and for limnol- 
ogy the Global Lake Ecological Observatory 
Network (n.d.) championed high-level efforts 
toward providing researchers with centralized 
repositories, resources, tools, and training to 
address data management needs. For example, 
the Ecological Metadata Language (EML) and 
data management tools such as Morpho from 
the Knowledge Network for Biocomplexity are 
standards and tools that are widely available 
(Fegraus, Andelman, Jones, & Schildhauer, 
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2005; Knowledge Network for Biocomplexity, 
n.d.). 

Among the natural resources graduate stu- 
dents we interviewed, there was a lack of aware- 
ness of existing practices, tools, or standard best 
practices in other areas, as well as a demand for 
point-of-need information and instruction at a 
very basic level. Although compilations of ba- 
sic guidelines exist, such as those published in 
the Bulletin of the Ecological Society of America 
(Borer, Seabloom, Jones, & Schildhauer, 2009) 
and the DataONE (n.d.a) Best Practices data- 
base, the information on data management and 
curation practices is scattered across various 
publications, websites, and training curricula. 

Similarly, an environmental scan of data 
management and curation at Cornell Univer- 
sity revealed that the available resources, train- 
ing, and services on data management at Cor- 
nell are scattered (Block et al., 2010). Hence, 
Cornell formed the Research Data Manage- 
ment Service Group in 2010 to be “a collab- 
orative, campus-wide organization that links 
Cornell University faculty, staff and students 
with data management services to meet their 
research needs” (Research Data Management 
Service Group, n.d., “Mission”). In the area 
of formal graduate student training, our scan 
found that several workshops and classes are 
available that cover various components of data 
management, and it is conceivable that pieces 
of the process may be addressed in research 
methods classes and research labs. For exam- 
ple, in the Department of Natural Resources 
at Cornell, there are courses that cover basic 
biological statistics, wildlife population analy- 
sis, hydrologic data and tools, data collection 
and analysis for forest and stream ecology, and 
spatial statistics. Other departments across the 
College of Agriculture and Life Sciences have 
courses that address geographic information 


systems (GIS), remote sensing, spatial model- 
ing and analysis, temporal statistics, genomics 
and bioinformatics. In terms of non-curricular 
opportunities, units such as the Cornell Uni- 
versity Library, Cornell Statistical Consult- 
ing Unit, and Cornell Institute for Social and 
Economic Research offer open workshops and 
consultation on GIS, basic data analysis, Bayes- 
ian statistical modeling, multilevel modeling, 
logistic regression analysis, linear regression 
parameters, path analysis, mediation analysis, 
experimental design, longitudinal data analy- 
sis, and other statistical techniques, as well 
as training on GIS software packages such as 
ArcGIS and Manifold, and statistical software 
such as SAS, SPSS, Stata, and R. However, de- 
spite these opportunities, there is still a lack of 
comprehensive training that addresses the ma- 
jor elements of data management for natural 
resources students in a holistic fashion. 


CASE STUDY OF GRADUATE STUDENT 
DATA INFORMATION LITERACY NEEDS 
IN NATURAL RESOURCES 


To discover more about data management 
needs at Cornell University, we used the DIL 
interview protocol (available for download at 
http://dx.doi.org/10.5703/1288284315510) 
to interview the faculty member in natural re- 
sources, one of his former graduate students, 
and a current lab technician during the period 
of March through May, 2012. Each participant 
rated how important DIL skills were to their 
data. The following section provides an over- 
view of the responses we received. 

The lab performed longitudinal studies of fish 
and zooplankton species. Some of the data sets 
contained information collected over decades, 
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TABLE 4.1 


Case Study (n = 3) 


DIL Competency Ratings of Participants in Natural Resources 


DIL Competency 


Discovery and acquisition of data 


Databases and data formats Essential 
Data conversion and ; 
: le Essential 
interoperability 
Data processing and analysis Essential 
Data visualization and F 
. Essential 
representation 
Data management and i 
ea Essential 
organization 
Data quality and documentation Essential 
Metadata and data description N/A 
Cultures of practice Important 
Ethics and attribution Essential 


Data curation and reuse 


Data preservation 


emphasizing the crucial need for data curation 
and maintenance over the extended life span of 
the data. Because these longitudinal data can- 
not be reproduced, a more formalized approach 
to data curation and management would be of 
great utility to students in the lab. The faculty 
member and lab staff also used databases exten- 
sively to organize and manage their longitudinal 
data sets. For this reason, they described acquir- 
ing the data management and organization skills 
necessary to work with databases and data for- 
mats, document data, and handle accurate data 
entry as essential (see Table 4.1). Otherwise, as 
the faculty member memorably stated, “it’s [as 
if] the data set doesn’t exist.” 

Interviewees noted data conversion and in- 
teroperability as a particularly important skill 


Faculty Member 


Somewhat important 


Very important 


Important 


Former 


Lab Technician 
Graduate Student 
Essential Very important 
Essential Important 
Essential Very important 
Essential Very important 
Essential Important 
Essential Essential 
Essential Essential 
Essential Important 
Essential Essential 
Very important Essential 


Essential Very important 


Essential Important 


for importing data into statistical packages. 
Two out of three of our respondents men- 
tioned that they lacked an understanding of 
the differences between raw and processed data 
and how they were used. The faculty member 
felt that students lacked a good understand- 
ing of data visualization theory, an interesting 
emerging area. Less important to the faculty 
member was that students had an understand- 
ing of how to access external data (other than 
geospatial data), how to find and evaluate data 
repositories, and version control. The reasons 
varied: in some cases the faculty member felt 
that there was little need for the skill on that 
particular project; the students learned the skill 
informally (e.g., finding external data or data 
repositories through trial and error); or one or 
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two people in the lab handled the task for ev- 
eryone (e.g., entering data into Excel and the 
Access database). 

Metadata was of high importance to all of 
our interviewees. When asked about metadata, 
the faculty member responded that he wasn’t 
even sure what it meant; however, he hoped to 
learn about it over the course of the collabora- 
tion. The former graduate student and the cur- 
rent lab technician placed even more emphasis 
on data documentation and description skills 
than the faculty member. The lab technician 
attributed much of the documentation and 
description he performed to a “personal cop- 
ing strategy,” so that when he came back to the 
data later he could understand what he did and 
where he was in the process. 

The former graduate student indicated that 
accessing and using external data sets, depos- 
iting data into repositories, data preservation, 
and intellectual property were important ar- 
eas of knowledge. He learned most of what he 
knew through trial and error, from colleagues, 
and in peer-to-peer learning. Perhaps this was 
one of the reasons that he was adamant about 
best practices and training students early in 
graduate school. In answering our question 
about what he wished hed known or been 
taught before graduate school, he said: 


By graduate school, that’s the point in which 
you are putting data in [spreadsheets], [so] 
your best management practices should be in 
place. But I recognize theyre probably not. 
... So [data management skills] should be 
the very first thing you learn when you come 


to grad school. 


When asked about the importance of the 
DIL skills, the former graduate student listed all 
as essential (see Table 4.1) but noted that some 
were covered better than others. For example, 


skill development in the discovery and acquisi- 
tion of data happened pretty well, but he found 
education about databases and data formats and 
data conversion and interoperability in its in- 
fancy. Within certain skill sets, like data process- 
ing and analysis, the degree program included 
tools, techniques, and their effects on inter- 
pretation, but did not include more advanced 
concepts like workflow management tools. He 
also noted that there was a lack of norms, or 
weak norms, in the field regarding its cultures of 
practice. There was a need for those in the field, 
especially faculty and principal instigators (PIs) 
of research projects, to push for higher stan- 
dards in data management issues. He felt that 
most of the outcomes he mentioned as essential 
were taught poorly or not at all. 

In fact, across most of the competencies 
discussed, lack of formal training for ac- 
quiring important skills arose as a common 
theme. The student and technician noted that 
they acquired most of their skills informally, 
especially in areas such as generating visual- 
izations and ascribing metadata to files, as 
there was no formal on-campus training and 
few readily identifiable people with expertise. 
Although there were classes and workshops 
available, students were not aware of them 
and were more receptive to just-in-time train- 
ing or troubleshooting. When we discussed 
the availability of Cornell courses to learn 
about R, one respondent said, “I don’t know if 
there are actual courses on it. I imagine there 
are somewhere, but I haven't pursued that and 
I dont know that I really have time to take 
a course.” The student described the optimal 
situation as one where he would have access 
to an expert who was using R in a similar way, 
much like the library has a GIS librarian avail- 
able for GIS users. 

There were some disconnects between what 
we learned from the faculty member and what 
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we heard from the lab technician and the for- 
mer graduate student. Discovery and acquisi- 
tion of external data was only somewhat im- 
portant to the faculty member. He felt that “if 
they didn’t know these [databases] existed, it 
wouldn't matter,” explaining that they seldom 
used external data in their research. However, 
the student and the lab technician reported 
using external data and exhibited limited 
knowledge of disciplinary repositories. Our 
discussion of cultures of practice skills followed 
the same path: it had less importance to the 
faculty member, but was essential to the stu- 
dent and the lab technician. The former grad- 
uate student’s level of awareness of the skills 
and their necessity was very high, especially 
since he had had a great deal of experience as 
an administrator of a large data set. For ex- 
ample, the faculty member and the lab tech- 
nician placed less emphasis on understanding 
formal metadata standards and data preserva- 
tion (counting them as important, but not 
essential), in contrast to the former graduate 
student and what we found in the environ- 
mental scan and literature review. They also 
did not mention workflows or tools like Mor- 
pho a great deal. This disconnect between fac- 
ulty and student views is unsurprising, since 
faculty tend to assume everyone understands 
the culture that they've been embedded in for 
years. Additionally, those who are not data- 
base administrators or who have not had oc- 
casion to need certain skills will naturally tend 
to downplay their importance. 

While respondents considered nearly all of 
the skills we covered in our interview impor- 
tant, those that were not as highly prioritized 
included discovery and acquisition of data and 
data preservation. Interestingly, there were a 
few differences in opinion between our faculty 
collaborator and the others we interviewed. 
The most dramatic difference was around 


discovery and acquisition of data, which the 
student and the lab technician felt was very 
important or essential. In contrast, our faculty 
collaborator felt that students should already 
have a good grasp of where to obtain data sets 
and therefore considered it only somewhat 
important (with the lowest rating of any of 
the competencies). Cultures of practice was an- 
other example of a competency that the fac- 
ulty member felt the students should under- 
stand (and he rated it as “important”). This 
is one that the student and the lab technician 
felt was essential and needed to be addressed 
in educational interventions. 


A TWOFOLD INSTRUCTIONAL 
APPROACH TO DATA INFORMATION 
LITERACY NEEDS 


In fall 2012 and spring 2013 we implemented 
instructional interventions based on our find- 
ings to address the gaps that we found in the 
curriculum covering data management skills. 
Given the wide range of competencies of inter- 
est to the faculty and students interviewed, the 
Cornell DIL team narrowed the skills down ac- 
cording to the following principles: 


1. Does the competency address a gap we 
found in the curriculum? 

2. Did we have the expertise to address the 
need? If not, could we include someone 
else who did have the expertise? 

3. Where could we add the most value? 


After asking these questions in concert with 
our faculty collaborator, the four DIL-related 
areas on which we focused were data manage- 
ment and organization, data analysis and visu- 
alization, data sharing, and data quality and 
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documentation. Our instructional approach was 
twofold: in the fall we offered workshops in the 
library addressing several data management 
topics; in the spring we offered a six-session, 
one-credit course for graduate students in nat- 
ural resources. 


Instruction Approaches: 1-Hour 
Workshops and 6-Week Course 


In the fall, we offered a series of 1-hour 
library-sponsored workshops aimed at gradu- 
ate students in the sciences, each introducing 
a different data management topic. The first 
workshop focused on data management plan- 
ning and was an unqualified success: 30 stu- 
dents attended and we had an additional 13 on 
a wait list. The subsequent workshops had lower 
attendance: 8 attended the data organization 
workshop, 10 attended the data documentation 
workshop, and only 4 signed up for the data 
sharing workshop, so it was canceled. Despite 
the decreased attendance at the later workshops, 
we felt we were successful because the later ses- 
sion subjects were more specific, addressing 
topics that appealed to a more limited audience 
than had the broader workshop on data man- 
agement (see Table 4.2). The students who at- 
tended were active and enthusiastic participants 
and expressed appreciation after the workshops. 

In the spring, the Cornell DIL team offered 
the six-session, one-credit course for graduate 
students in natural resources, Managing Data 
to Facilitate Your Research. The data librar- 
ian and the faculty collaborator co-taught the 
course. The content was similar to the fall se- 
mester library workshops, but we were able to 
build on prior classes as we progressed through 
the material. For example, in the workshop for- 
mat, we introduced the basics of data manage- 
ment as part of each workshop; in the course 
format we introduced data management in 


the first session and were focused on addi- 
tional content in each subsequent class. At the 
beginning of each session, we recapped what 
we covered in the last session and offered time 
to respond to questions. Because we listed the 
course through the Department of Natural Re- 
sources, we had a more subject-specific focus 
and drew on examples from ecology and fisher- 
ies research. For example, during the session on 
data analysis and visualization, the faculty col- 
laborator demonstrated linking stable isotope 
data from the Cornell University Stable Isotope 
Laboratory to the master database file from the 
Adirondack Fisheries Research Program. This 
involved discussing data import, linking the 
new table to master tables in the database, de- 
veloping a query, and exporting the data into 
Microsoft Excel. All of these topics could have 
been discussed without the context of real re- 
search data, but using real-life examples drawn 
from the discipline helped the students under- 
stand what was happening in the data man- 
agement process and, more importantly, why 
it should happen. We created a library guide 
for the course, available at http://guides.library 
.cornell.edu/ntres6940. 

We drew on several resources to build the 
course and workshop content. For example, 
DataONE (n.d.b) created education mod- 
ules covering data management topics that 
are openly available at http://www.dataone 
.org/education-modules. We relied heavily on 
those that matched our identified needs. We 
did make changes to the slides, adjusting for 
the discipline and for the time allotted. We also 
made use of an Ecological Society of America 
(ESA) publication about best practices in data 
management (Borer et al., 2009). 

Twenty-five students enrolled in the course. 
Most of the students were from the natu- 
ral resources department, though there were 
students from biological and environmental 
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TABLE 4.2 Weekly Course Topics and Readings in the Spring 2013 One-Credit 


Cornell Course NTRES 6940: Managing Data to Facilitate Your Research 


Topic Description and Readings 
1. Introduction to We will use the first class session for introductions and logistics. The instructors 
data management will give a brief explanation of DMPs and reasons for using them. We'll then 


have a group discussion of research, data problems encountered, and data 
management needs. 
Readings: 

Wolkovich, E. M., Regetz, J., & O'Connor, M. |. (2012). Advances in global 
change research require open science by individual researchers. Global 
Change Biology, 18(7), 2102-2110. http://dx.doi.org/10.1111/j.1365 
-2486.2012.02693.x 

National Science Foundation (n.d.). Dissemination and sharing of research 
results. http://www.nsf.gov/bfa/dias/policy/dmp.jsp 

Research Data Management Service Group (n.d.). Data management 
planning: Guide to writing a data management plan (DMP). http://data 
.research.cornell.edu/content/data-management-planning 


2. Data organization Organizing your data at the front end of a research project will save time and 
increase your ability to analyze data. This session will introduce you to the 
principles involved in creating a relational database and will provide examples 
to help you organize your own data in this manner. Topics will include best 
practices for data entry, data types, how to handle missing data, organization 
by data type, and data file formats. 

Readings: 

Borer, E. T., Seabloom, E. W., Jones, M. B., & Schildhauer, M. (2009). Some 
simple guidelines for effective data management. Bulletin of the Ecological 
Society of America, 90(2), 205-214. http://dx.doi.org/10.1890/0012 
-9623-90.2.205 

Research Data Management Service Group (n.d.). Preparing tabular data for 
description and archiving. http://data.research.cornell.edu/content 
/tabular-data 


3. Data analysis and Analyze existing data and create graphs using R in order to effectively 
visualization communicate findings. 
Readings: 

DataONE (n.d.). Education modules: Lesson 10—Analysis and workflows. 
hitp://www.dataone.org/education-modules 

Noble, W. S. (2009). A quick guide to organizing computational biology 
projects. PLoS Computational Biology, 5(7), €1000424. http://dx.doi 
.org/10.1371/journal.pcbi.1000424 


Continued 


engineering, ecology and evolutionary biology, students attended four or more of the six ses- 
crop and soil sciences, and civil and environ- sions in the course (see Figure 4.1). 

mental engineering. The students ranged from Given that it was only a one-credit, 6-week- 
first-year to fourth-year graduate students. Two long course, we could only briefly touch 
faculty and staff members attended. Fifteen upon the major issues. A mix of higher level, 
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TABLE 4.2 Weekly Course Topics and Readings in the Spring 2013 One-Credit 


Cornell Course NTRES 6940: Managing Data to Facilitate Your Research—cont'd 


Topic 


4. Data sharing 


5. Data quality and 


documentation 


6. Final wrap-up: 
data management 
plans 


Description and Readings 


The NSF and other funding agencies have already adopted data sharing policies. 
Publishers also have data sharing requirements, whether they host data 
themselves, or expect researchers to deposit data in a data center or to make 
it available upon request. So where to share? During this class session, we'll 
discuss disciplinary databases, Cornell’s eCommons digital repository, and 
some other sharing strategies, and will talk about evaluation criteria upon 
which to base your decision about where to share your data. 

Readings: 

Center for Research Libraries. (2005). General factors to consider in 
evaluating digital repositories. Focus on Global Resources, 25(2). http:// 
www.crl.edu/focus/article/486 

Databib | searchable catalog of research data repositories (http://databib 
.org/index.php) 

eCommons@Cornell (http://ecommons.library.cornell.edu/) 


While written documentation—for example, in a lab notebook—is still important, 
the platforms on which modern researchers are working and collecting data 
are increasingly complex. How do you document your digital data and the 
steps you take to analyze it? Are your files sufficiently organized and well 
described so that others can interpret what you've done? What about yourself, 
3 months from now? During this class session on data documentation, we'll 
discuss the challenge of remembering details relevant to interpreting your data, 
and offer some best practices and strategies to adopt in order to organize and 
describe your data for yourself and others. 

Readings: 

Disciplinary Metadata | Digital Curation Centre (http://www.dcc.ac.uk 
/resources/metadata-standards) 

Kozlowski, W. (2014). Guidelines for basic “readme” style scientific metadata. 
hitp://data.research.cornell.edu/sites/default/files/SciMD_ReadMe 
_Guidelines_v4_1_0.pdf 

Rudstam L. G., Luckey, F., & Koops, M. (2012). Water quality in offshore Lake 
Ontario during intensive sampling years 2003 and 2008: Results from the 
LOLA (Lake Ontario Lower Foodweb Assessment) Program. http://hdl 
ehandle.net/1813/29691 


For the final class session, participants will have the opportunity to present a DMP 
for peer discussion and review. Depending on interest, presentations may 
range from 6 to 15 minutes. 

Readings: 

Sample DMP from Inter-University Consortium for Political and Social Research 
(ICSPR) (http: //www.icpsr.umich.edu/icpsrweb/content/datamanagement 
/dmp/plan.html) 

Sample DMPs from University of California San Diego (http://idi.ucsd.edu 
/data-curation/examples. html) 

Sample DMPs from the University of New Mexico (hitp://libguides.unm.edu 
/content.php@pid=137795&sid= 1422879) 
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Figure 4.1 Self-reported attendance by students enrolled in the spring 2013 one-credit 
Cornell course NTRES 6940: Managing Data to Facilitate Your Research (n = 19). 


conceptual articles gave context to our discus- 
sions, along with more practical resources for 
students to explore on their own and pointers 
to Cornell University resources for training and 
just-in-time help. 


Learning Objectives for 
the Cornell Course 


The aim of our instruction for the course was 
to introduce students to data management 
best practices in natural resources and to help 
students create plans to manage their data ef- 
fectively and efficiently while meeting funder 
and publisher requirements. The learning ob- 
jectives were as follows: 

By the end of this course, students will be 
able to 


e describe data management and why it is 
important; 


describe their research and data collection 
process in order to identify their data life 
cycle and complete the initial part of the 
DMP; 

evaluate a DMP to recognize the neces- 
sary components of a successful plan; 
describe and follow best practices in 
structuring relational databases to make 
analysis and retrieval easier/more efficient 
for long-term studies; 

analyze existing field data and create 
graphs using R to effectively communi- 
cate findings; 

evaluate disciplinary data repositories to 
determine requirements and fitness for 
data deposit; 

evaluate the annotation/documentation 
accompanying a data set to recognize the 
appropriate level necessary for long-term 
understanding by self and others; 

create a DMP to manage and curate 
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TABLE 4.3 Needs and Learning Outcomes Addressed in the Cornell For-Credit 


Class per Session 


Session 


1. Introduction to 


data management 


2. Data 
organization 


3. Data analysis 
and visualization 


4. Data sharing 


Needs Identified 


Basic introduction to data management: 
importance in the research context of 


the audience 


Acquiring the data management and 
organization skills necessary to work 
with databases and data formats, 
document data, and handle accurate 
data entry is described as essential, 
otherwise, “it’s as if the data set 
doesn’t exist” 


A good understanding of higher 
end data visualization, though not 
positioned as currently essential but 
as an interesting emerging area by 
the instructor, was in short supply. 
The lab primarily uses R for data 
analysis and visualization, but 
training is limited, and not aimed 
specifically at students in 
natural resources 


Areas such as accessing external data 
(except for background geospatial 
data) and finding and evaluating 
data repositories were of less 
importance to the faculty member 
than to the students, but the faculty 
member expressed interest in 
learning more about Cornell’s 
institutional repository 


Outcomes Addressed 


Understands the life cycle of data, 
develops DMPs, and keeps track of the 
relation of subsets or processed data 
to the original data sets 

Creates standard operating procedures 
for data management and 
documentation 


Understands the concept of relational 
databases, how to query those 
databases, and becomes familiar with 
standard data formats and types for 
discipline 

Understands which formats and data 
types are appropriate for different 
research questions 


Becomes familiar with the basic analysis 
tools of the discipline 

Uses appropriate workflow 
management tools to automate 
repetitive analysis of data 

Proficiently uses basic visualization tools 
of discipline 


Recognizes that data may have value 
beyond the original purpose, to 
validate research or for use by others 

Locates and utilizes disciplinary data 
repositories 


Continued 


their own data for effective long-term 
use and reuse as well as to meet funding 
requirements. 


Each session attempted to meet the learn- 
ing outcomes outlined by the DIL project (see 


Table 4.3). We addressed them through a vari- 
ety of activities; however, we were not able to 
address all of them in great depth. Some sec- 
tions of the course were more traditional. For 
example, students read an article on effective 
data management practices (Borer et al., 2009) 
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TABLE 4.3 Needs and Learning Outcomes Addressed in the Cornell For-Credit 


Class per Session —cont d 


Session 


Needs Identified 


5. Data quality and Skills such as ascribing metadata 


documentation 


to files are acquired informally; 
furthermore, the faculty member 
noted he wasn’t even sure what was 
meant by metadata, and he hoped to 
learn about it over the course of the 


Outcomes Addressed 


Recognizes that data may have value 
beyond the original purpose, to 
validate research or for use by others 

Understands the rationale for metadata 
and proficiently annotates and 
describes data so it can be understood 


collaboration 


6. Data 
management 
plans 


before class and commented to a discussion 
forum on points they found interesting or that 
needed more clarification. Then we reviewed 
the comments and discussed them in class. 

We considered graduate students to be 
expert learners; therefore we employed collab- 
orative learning techniques, including think- 
pair-share and group problem solving (Center 
for Teaching Excellence, 2013b). For exam- 
ple, as a class activity students discussed their 
research data life cycle in detail and then drew a 
diagram of the stages of research. For “evaluate 
disciplinary data repositories” students worked 
in groups to identify possible repositories for 
data deposit for their subject. (See Appendix 


Funders and other organizations are 
increasingly requiring DMPs, and few 
graduate students are aware of the 
components of a good DMP 


and used by self and others 
Develops the ability to read and 
interpret metadata from external 
disciplinary sources 
Understands the structure and purpose 
of ontologies in facilitating better 
sharing of data 


Understands the life cycle of data, 
develops DMPs, and keeps track of the 
relation of subsets or processed data 
to the original data sets 

Creates standard operating procedures 
for data management and 
documentation 

Articulates the planning and actions 
needed to enable data curation 


B to this chapter for a full description of the 
exercise.) For the session on data documenta- 
tion, students worked in groups with examples 
of metadata and evaluated what was done well 
and what could be improved. Finally, we asked 
those who chose to complete the optional 
DMP exercise to complete a different section of 
the DMP each week, and participants received 
feedback from the librarian instructors. 


Assessment 

We used a combination of formative and sum- 
mative assessment tools, including 1-minute 
reflections after each session, feedback on 
outputs from active learning exercises, and a 
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Sarah Wright _ iNstRUCTOR MANAGER 
February 19 One Minute Reflection 
What are the most useful things you learned from today’s class? What were your remaining questions? Figure 4.2 One Minute 


Reflection assignment via 
a course Blackboard site. 


final survey (Center for Teaching Excellence, 
2013a; Downey, Ramin, & Byerly, 2008). A 
1-minute reflection was administered either as 
a survey after each library workshop, or as a 
discussion question via the course Blackboard 
site. Figure 4.2 shows a typical 1-minute 
reflection assignment. 

In addition to the 1-minute reflection posts, 
we used the discussion board for students to ask 
questions after each class session. There were 68 
posts, with 21 participants—representing the 
majority of the students enrolled. We gained 
many substantial and useful comments using 
this method. In fact, the comments were so 
useful that it became our practice to review the 
most pertinent comments at the beginning of 
each class as a way to emphasize content from 
the last class or to lead into content for that 
day’s class. After the class on data organization 
and the use of relational databases, we received 
positive feedback from students enthusiasti- 
cally discussing the changes they would make 
due to what they had just learned. 


The “rules of thumb” were a great summary 
of various best practices for data management. 
It was interesting to read that computer code 
was actually a form of metadata in itself. I 
suppose I had never looked at it in that light 
before but from now on I will take my com- 
menting more seriously! I was also grateful for 
the explanation of best practices for relational 
databases. I’ve heard of the term but this paper 
did a great job walking through the formation 
of one, step by step. Finally, Pm finding that 
by taking this class and doing these readings 


I’m becoming more aware of different data 
management services in my own field. 

Three points from Borer et al. (2009) that 
were particularly useful: [1] the merits of us- 
ing scripted analyses. Having used JMP for 
4 years, I know too well the agony of try- 
ing to replicate drop-down menu instruc- 
tions months after doing an analysis. I plan 
to switch to R. [2] standardized file naming 
system using the international date format. 
While I use descriptive folder names, I do 
not always use descriptive file names and I am 
not consistent with date format . . . [which] 
makes searching for files on my computer 
inefficient . . . [and] also means that when I 
send others my data it loses some descriptive 
information . . . [3] full taxonomic names 
in data files. A few years ago I did an experi- 
ment in which I identified 100+ plant spe- 
cies in the field. I used abbreviations in my 
data. Flash forward 3 years, and it took me 
days to reconstruct what all my abbreviations 
were. Some taxonomic names had changed. 


Never again! 


We received comments that required follow- 
up and more conversation: 


The relational database method seems great 
but will take some getting used to. Is there 
a way to connect excel and access files that 
would allow you to input data and automati- 


cally update in the relational database? 


Learning about relational databases was very 


useful. Efficient organization of spreadsheets 
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How would you rate your knowledge/skills/ability in the following areas BEFORE taking this class? 


No Little 
Competence 


0 25 


Competence 


Somewhat Very Not 
Competent Competent Applicable 


50 75 100 


Describing data | 
management and | 
its importance and 
relevance to you | 


| | 
| | 


Figure 4.3 Example self-assessment survey question using a slider scale. 


was also helpful. I would like to learn more 
about how to organize metadata, but I think 
this is an upcoming class discussion. Also, I 
am still lacking clear reasons why Access is 
preferable to Excel. What does Access offer 
that Excel does not? What are the features 


that make Access particularly useful? 


After reading the last comment, we felt that 
we had not clearly explained the advantages 
of a relational database, so we addressed that 
point at the beginning of the next class. As 
these examples illustrate, the 1-minute reflec- 
tions proved to be a powerful form of forma- 
tive assessment that allowed us to respond to 
the learning needs of the students. 

We also provided active learning exercises 
so that students could receive outcomes-based 
assessments of their work and understanding. 
Some of these were in-class exercises that we 
collected and delivered feedback on for the stu- 
dents. Others were optional out-of-class assign- 
ments, which included rubrics for assessment. 
Though few students completed the optional 
assignment (n = 5), all who tried it found it 
useful; those who didn’t complete it indicated 
that it probably should be required in the class. 
In most cases, we simply discussed what stu- 
dents found during the exercises and gave feed- 
back during discussion. 

Finally, we administered a self-assessment 
survey at the end of the class to gauge the suc- 
cess of our experimental course (see the full 
instrument in Appendix C to this chapter). 


We invited and received constructive criticism 
via the survey instrument, much of which will 
guide our next attempt at offering similar in- 
struction. 

Here, we also asked the students to self- 
evaluate their skill levels concerning the course 
outcomes both before and after taking the 
class (see an example in Figure 4.3). Rather 
than performing pre- and post-evaluations, 
we asked students to rate their skill levels be- 
fore and after, after instruction occurred. This 
method avoided the problem of overestima- 
tion of skill that is common before learning a 
new topic (Kruger & Dunning, 1999). Having 
learned more about the course outcomes, stu- 
dents could then better compare what they ac- 
tually knew at the beginning to what they had 
learned during the course. 

On average, responses (n = 17) showed 
marked increases in the skills, knowledge, 
and abilities that the students felt they pos- 
sessed after taking the class, as shown in Figure 
4.4. However, there was room for improve- 
ment since on average students rated none of 
the outcomes in the “somewhat competent” 
to “very competent” range after the course. 
In fact, several outcomes received an average 
rating of “little competence” and “somewhat 
competent” following the course. And, the 
most frequently voiced criticism of the class 
was that we touched on a lot of important top- 
ics, but we didn’t have time to go in-depth and 
failed to provide enough opportunities to prac- 
tice what wed discussed. Still, feedback was 
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Creating a data management plan in order to manage 
and curate your own data 
Documenting your data for yourself and others 
Evaluating data repositories in order to determine 
requirements & fitness for data deposit 
Visualizing data and creating graphs 


Describing and following best practices in structuring 
relational databases 


Recognizing the necessary components of a data 
management plan 


Describing your research and data collection process in 
order to identify your data lifecycle 


Describing data management and its importance and 
relevance to you 


m AFTER taking the class 


+ + + + t 
10 20 30 40 50 60 70 80 
m BEFORE taking the class 


Figure 4.4 Average responses (before and after) to the survey question, “How would you rate 
your knowledge/skills/ability in the following areas?” (n = 17). (Note: Due to a technical error 
in the survey, the student response to “Evaluating data repositories in order to determine require- 
ment & fitness for data deposit” before taking the class could not be included in this figure.) 


overwhelmingly positive, and the majority of 
students (13 out of 16) would recommend this 
course to others in natural resources. 


RESULTS 


Overall, the response to the workshops and the 
course was very positive. Students reported a 
better awareness of data management skills and 
the resources and tools available to them. One 
student noted, “I think the topic of this class 
is SO ESSENTIAL [to] the way scientific re- 
search is being carried out and shared now. . . . 
[This course] fills a hole in Cornell grad educa- 
tion.” Filling a need in the curriculum is exactly 
what the Cornell DIL team was trying to do, 
and it was gratifying that students recognized 


the importance of the topic and appreciated 
our educational efforts! 

‘The self-reported increase in skill for all of the 
learning outcomes was another positive outcome 
of the course. The marked increase in students’ 
abilities to articulate the importance of data man- 
agement, to create their own DMP, and to de- 
scribe and document their own data collection 
practices was an important step forward. Their 
comments in the end-of-class survey bore this out 
and indicated their increased awareness of many 
areas of data management. As one student said: 


I think just the exposure to the different as- 
pects of data management and the discussion 
about the usefulness of relational databases 
and analysis software like R can be of great 


benefit to students, especially those that are 
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relatively new to research and may not be 
aware of the types and benefits of resources 


available to them. 


Benefits for the DIL project included uncov- 
ering areas in which there was a need for more ex- 
ploration, such as curation of training resources 
and opportunities, direct instruction on tools 
(e.g., conversion from Excel to database pro- 
grams, database tools for Mac users, data visual- 
ization tools, qualitative analysis tools like Atlas. 
ti), and allowing students to exchange informa- 
tion and network with each other. Interestingly, 
in the final class students exchanged informa- 
tion about ad hoc training in data visualization 
in departments beyond natural resources. This 
shows the library's potential role in facilitating 
peer-to-peer training in addition to the formal, 
instructor-led educational initiatives. The library 
is experimenting with the role of facilitator to 
crowd source tips and workflows from students 
who have expertise and to schedule project clin- 
ics with interested and skilled students and staff. 
This facilitator role could be fruitfully applied 
to DIL and would address the need to balance a 
great need for specialized instruction with a small 
library staff that has limited time and skills. 

Before the course ended, the project team 
at Cornell discussed how to continue provid- 
ing data management instruction and what 
could be done to improve it. This project has 
been an exciting experiment, but there is much 
interest beyond the library. Our faculty col- 
laborator discussed how to offer this course 
next time— indicating even before we had fin- 
ished the course that he was invested in doing 
it again. Building a stronger relationship with 
this faculty member and investigating the stu- 
dents’ need for hands-on training (in areas that 
faculty assumed the students knew or would 
learn informally along the way) was one of the 
most rewarding parts of this experience. 


The course also gained wider recognition 
among faculty and students; it was the focus 
of a short article in the Cornell Chronicle titled 
“Course Teaches Grad Students How to Man- 
age Their Data,” which sparked inquiries from 
faculty and graduate students in other depart- 
ments (Glazer, 2013). This prompted the li- 
brary to hold more one-time sessions and to 
add modules to online guides that hopefully 
will lead to more course- and curriculum- 
integrated instruction. 

Although the student feedback was very 
positive, there is room for improvement. For 
example, the scope of the course should be 
more focused, and it would work better with 
a smaller group that has a similar level of ex- 
perience. We would like to expand the course 
beyond six sessions, or eliminate content if we 
are unable to increase the number of sessions. 
In the current course, we included more ma- 
terial than we could reasonably cover. These 
changes would also allow us to introduce more 
exercises and to provide more opportunities for 
hands-on learning. This was a major criticism 
received of the course. Including more practi- 
cal exercises in the course and holding project 
clinics and peer-led workshops would provide 
students the opportunity to experiment with 
and learn using their own research data. These 
formats would also allow students more time 
for discussion and peer exchange around per- 
sonal workflows and existing practices. They 
would make the sessions less prescriptive and 
instructor-led and more student-led and free- 
flowing. Discussions would also allow for more 
just-in-time exchange of information for stu- 
dents who are interested in particular areas, 
and for more advanced students who might 
not want to take a full course. 

With these goals in mind, we plan to pro- 
vide general, beginner-level data management 
library workshops in the fall, open to anyone, 
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focused on topics like creating a DMP or writ- 
ing a readme file to describe your data. We'll 
then provide a disciplinary course (possibly in 
other departments that have expressed interest) 
where we can provide more focused, in-depth 
instruction and require active learning compo- 
nents, such as the creation of a DMP. The peer- 
to-peer workshop model and project clinics are 
also a possibility for the future. 

It is clear that DIL skills are important skills 
that graduate students feel are not being taught 
sufficiently in their programs. A former gradu- 
ate student brought up the need for data man- 
agement instruction even earlier, stating, “I 
think it starts as an undergraduate. It’s an easily 
understood discipline at even a high school or 
junior high level, and I would start it that early, 
if possible.” We would like to incorporate data 
management instruction into undergraduate 
laboratory classes, similar to the way we've in- 
corporated information literacy into the cur- 
riculum at multiple points in programs. This 
is a long-term goal that has grown out of the 
current project, and it will require collabora- 
tion and the investment of groups both inside 
and outside the library. 


DISCUSSION 


The Cornell DIL team entered this project 
with a general idea of the DIL competen- 
cies; however, interacting with students and 
teaching the competencies resulted in some 
changes to our original impressions. Much as 
the ACRLs (2000) information literacy com- 
petency standards outline high-level outcomes 
for information literacy across an entire cur- 
riculum, the DIL competencies are a starting 
point for articulating what data management 
concepts students should understand and ap- 
ply throughout their research careers. How this 


plays out at varying stages of a researcher’s edu- 
cation and for each discipline is a much more 
detailed and idiosyncratic issue. We found that 
many of the students in the course, especially 
those at the beginning skill levels in a particular 
competency, wanted much more specific (and 
often very tool-based) skills (e.g., how to bet- 
ter use spreadsheet and database packages like 
Excel and Access), rather than the higher level 
conceptual DIL skills, especially in the absence 
of an immediate real-world application (e.g., 
funder data sharing requirements). 

Since the competencies outlined in the DIL 
project covered such a wide range in a quickly 
changing field, they placed an emphasis on the 
recognition and understanding of general best 
practices and much less emphasis on the skills 
needed at the disciplinary and lab/project level. 
Working with the general DIL competencies 
and tailoring them to course and class session 
outcomes forced us to refine and articulate what 
we wanted students to be able to do and how 
we wanted them to demonstrate and apply their 
understanding to their disciplinary-specific situ- 
ation. For example, we recognized that skills 
build in a progression, so we derived the follow- 
ing outcomes from the general DIL competency 
“understands the life cycle of data, develops 
DMPs, and keeps track of the relation of subsets 
or processed data to the original data sets”: 


e Describe research and data collection 
process to identify data life cycle and 
complete initial part of DMP 

e Evaluate a DMP to recognize the neces- 
sary components of a successful one 

e Create a DMP to manage and curate own 
data for effective long-term use and reuse 
as well as to meet funding requirements 


In the course, we briefly addressed tracking 
subsets of data, but addressing this topic alone 
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was much more involved than it first appeared. 
This pattern emerged in working with the com- 
petencies. 

‘The range of skill levels in the class and the 
wide variety of types of data with which they 
worked (e.g., quantitative and qualitative; small 
and large data sets in multiple formats) showed 
the need for competencies that progressed over 
time from basic understanding and tool-based 
skills to higher level competencies in analysis 
and synthesis, as well as for outcomes that ad- 
dressed particular disciplines or kinds of data. 
This work is the beginning of that effort. 

Questions we asked ourselves in the process 
of creating the workshop series and the for- 
credit course map well to areas that we need to 
address to integrate DIL competencies into the 
curriculum: 


e What skills do students currently have and 
where are their most pressing needs? The 
interviews we conducted with the faculty 
member and students in natural resources 
gave us an in-depth view of the skills and 
attitudes of a very small sample. A larger 
survey of graduate students and faculty in 
natural resources (and other disciplines) 
would give a better idea of the needs of 
the campus community. 

© What are the gaps in the curriculum? 
What outcomes are already addressed, 
where, and at what levels? As part of the 
environmental scan, we identified the 
training available, but a closer look at 
the syllabi of courses that incorporated 
DIL outcomes and a census of available 
workshops and other training could help 
us target our efforts. 

° Do we have the expertise to address student 
and researcher needs? If not, could we in- 
clude someone else in or provide staff pro- 
fessional development to gain the missing 


expertise? It does no good to plan instruc- 
tion if we do not have the expertise to de- 
liver it, so we asked ourselves: Who is the 
best person to answer this need? 

e Where can we add the most value? Where 
can we find partners to supplement areas 
that are outside our purview? Strategic 
partnerships with other departments on 
campus can help reach students at the 
time of need. 

© What curriculum resources already exist to 
meet particular DIL outcomes and at what 
level? Instead of reinventing the wheel, we 
should try to find, centralize, and adapt 
available curriculum resources for DIL 
educational content. A repository or di- 
rectory of curriculum resources for DIL 
would be useful. 


CONCLUSION 


We are only beginning to specify the com- 
petencies in DIL that will develop the data 
management skills that future researchers and 
scientists will need, and many barriers to iden- 
tifying them still exist. The rapidly changing 
nature of the field, the heterogeneity of skills 
within the disciplines, and the intensive and 
long-term nature of the task of integrating 
DIL skills within (and alongside) the curricu- 
lum present challenges to academic librarians 
seeking to take on this task. The questions 
posed in our discussion are a start. Similarly, 
the workshop series and for-credit course that 
we piloted at Cornell University are just a 
beginning. And the harsh reality is that it is 
impossible to scale or sustain workshops or 
credit courses to reach graduate students in 
all disciplines. These interventions may work 
best as gateways to introduce students to the 
range of skills they need to acquire through 
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other more targeted workshops and classes, 
throughout their academic career. However, 
by taking the lessons learned in these pre- 
liminary initiatives, and by using the modules 
we created or adapted, we can build on this 
foundation to create an integrated, progres- 
sive DIL program that will prepare students 
for the challenges and changes ahead. 


NOTE 


This case study is available online at http:// 
dx.doi.org.10.5703/12882843 15476. 
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APPENDIX A: Rubric for the Spring 2013 One-Credit Course 


Rubric for Evaluating Data Management Plans* 


This rubric includes the National Science Foundation’s requested components of a data manage- 
ment plan (DMP). Note that a DMP should be no longer than two pages and should be clear and 


concise. Therefore, it will be very difficult to achieve an “excellent” rating for every section of the 


DMP-—satisfactory is satisfactory for the majority of the components. A thorough, high-quality 


DMP will contain several “excellent” components and many “satisfactory” components. 


Excellent 


DESCRIPTION 


Provides brief, nontechnical 
description of data produced 
during all stages of project (i.e., 
data collection, processing, 
analysis, sharing, and 
archiving) 

Indicates in detail which data will 
be shared and when for each 
stage of the project; if no data 
to be shared, states this and 
indicates why not 

Describes in detail impact 
of data sharing on larger 
community (including examples 
of possible interdisciplinary 
use of the research data) 
and how strategy helps to 
disseminate research to that 
larger community; if no impact 
or community exists, statement 
to that effect and explanation 
about why 


CONTENT AND FORMAT 


Describes data collection and 
processing plans in full, step- 
by-step detail (e.g., raw/ 
processed/reduced/analyzed 
data, software code, samples, 
curricula) 


Satisfactory 


Provides brief, nontechnical 
description of data produced 
during most key stages of 
project 

Indicates which data will be 
shared and schedule at basic 
level; may be lacking detail for 
some data stages; if no data to 
be shared, states this 

Describes general impact of data 
sharing on research community 
and how strategy helps to 
disseminate research to that 
larger community; if no impact 
or community exists, statement 
to that effect 


Describes data collection and 
processing plans in general 
detail (e.g., raw/processed/ 


reduced/analyzed data, 
software/code) 


Unsatisfactory 


Missing or incomplete description 
of data produced during key 
stages of project that would 
hinder understanding of data 
life cycle 

Missing any indication of data to 
be shared and timeline 

Missing description of data 
importance; no mention of 
broader community that might 
benefit from data sharing (or if 
no impact or community exists, 
no statement to that effect and 
or explanation about why) 


Missing or incomplete description 
of data collection and 
processing plans 


Continued 


* Rubric adapted from the Cornell Research Data Management Service Group’s Data Management Planning Overview, 


available at http://data.research.cornell.edu/content/data-management-planning. 


Excellent 


CONTENT AND FORMAT—cont'd 


Identifies all file formats used 
throughout the course of the 
project (including those for 
collection, use, conversion, 
and formatting for sharing and 
archiving); selects file formats 
for sharing and archiving that 
maximize potential for reuse 
and longevity; describes plans 
for conversion, if necessary 

Identifies in detail metadata 
(documentation) standards (if 
applicable) or supplementary 
documentation necessary to 
make data understandable; 
indicates who will document 
data and when; explains reason 
for choosing documentation 
strategy 
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Satisfactory 


Identifies most file formats used 
over the course of the project 
(including those for collection, 


use, conversion, and formatting 


for sharing and archiving) 
Identifies basic metadata 
standards and/or basic 


documentation needed to make 


data understandable; lacks 


details of who is responsible for 
documentation and when it will 


occur 


PROTECTION AND INTELLECTUAL PROPERTY (IP) 


Describes full data management 
and storage procedures 
(e.g., identification of storage 
facilities, backup policies 
(including frequency, automated 
or manual), need for secure or 
restricted access, confidentiality 
and privacy issues (including 
anonymizing and protecting 
personally identifiable data, 
and any legal or ethical 
requirements); includes 
explanation of advantages of 
strategy chosen 

Indicates and documents 
licensing and IP for data 
(including use of licenses such 
as Creative Commons or Open 
Data Commons or formal 
policies on data usage and 
creation of derivative works) 

Plans to include full rights 
statements in metadata and/or 
other documentation 


Describes basic data 
management and storage 
procedures (e.g., identification 
of storage facilities, backup 
policies [frequency, automated 
or manual], need for secure or 
restricted access) 

Indicates and documents basic 
policies on data usage, reuse, 


and creation of derivative works 


(e.g., data can be shared and 
reused noncommercially with 
credit) 

Mentions basic reuse 
requirements; may not 
explain how terms will be 
communicated 


Unsatisfactory 


Missing or incomplete description 
of file formats 

Missing or incomplete 
identification of basic metadata 
and documentation 


Missing description of data 
management and storage 
procedures 

Missing any statement on 
licensing and IP policies 

Missing any mention of terms of 
reuse 


Continued 


Excellent 


ACCESS 


Describes detailed plan and 
infrastructure (i.e., hardware, 
campus services, commercial 
services, or disciplinary 
data centers) for storing and 
providing access to data; 

Provides detailed description 
of access mechanisms and 
policies, including any 
potential restrictions to access; 
describes the rationale behind 
them; provides a timeline for 
providing access 

Indicates in detail how access 
strategy will maximize the value 
and the discoverability of the 
data to interested audiences; 
provides examples of potential 
audiences 
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Satisfactory 


Describes basic plan and 
infrastructure (i.e., hardware, 
campus services, commercial 
services, or disciplinary 
data centers) for storing and 
providing access to data 

Provides a general description 
of access mechanisms and 
policies; missing potential 
restrictions, the rationale behind 
them, and applicable timeline 

Indicates in general how access 
strategy will maximize the 
value of the data to interested 
audiences; lacks examples of 
potential audiences 


PRESERVATION AND TRANSFER OF RESPONSIBILITY 


Identifies data to be preserved 
after end of project (including 
thorough explanation of 
selection rationale) 

Describes preservation resources 
(e.g., hardware or campus 
or commercial services, 
institutional commitment or 
funding), selection rationale, 
policies, expertise, and plans 
for transfer of responsibility to 
keep data accessible long term 


Identifies data to be preserved 
after end of project (including 
cursory description of selection 
rationale) 

Describes preservation resources 
(e.g., hardware or campus 
or commercial services, 
institutional commitment or 
funding) and plans for transfer 
of responsibility to keep data 
accessible long term 


Unsatisfactory 


Missing or incomplete description 
of plan for access and 
infrastructure 

Missing any information on 
access mechanisms and policies 

Missing any indication of how 
access strategy will maximize 
value of data 


Missing or incomplete description 
of data to be preserved and no 
description of selection rationale 

Missing or incomplete description 
of preservation resources 
(e.g., hardware or campus 
or commercial services, 
institutional commitment or 
funding) and no plans for 
transfer of responsibility 
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APPENDIX B: Data Sharing Exercise Outline for “Evaluating Repositories” 


This outline provides an example of the in-class exercise on evaluating repositories in the Data 
Sharing session of the Cornell for-credit course in the spring of 2013. Students were instructed as 
follows: 
1. Use DataBib (http://databib.org) to identify a repository of interest to explore. 
2. Next, take 10 minutes to look at the repository individually, noting why you think it 
would be a good or bad fit for your data. Check the repository for: 
a. Supporting organization. 
b. Usage rights, licenses, or other policies related to reuse and redistribution of data. 
c. Technical systems/data security (policies and methods for backup, redundancy, authen- 
tication, formats accepted). 
d. Preservation commitment. 
3. In groups of 3—4, take the next 10 minutes to discuss your findings in your group. 
4, Use 5-10 minutes per group to report out to the class your group’s answers to the follow- 


a. Repositories chosen. 

b. ‘The top reasons for choosing a repository. 

c. The top reason for not choosing a repository. 

d. Whether or not DataBib was helpful. Other information you would like to have to 
make your choices easier. 
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APPENDIX C: Assessment Tool 


NTRES 6940 End-of-Class Evaluation 


Thank you for taking NTRES 6940! This survey will help us measure to what degree we accom- 
plished our goals, discover what we can do to improve the class for future students, and inform our 
grant project. It is completely anonymous and doesn’ reflect on your grade. Thank you for taking 
the time to complete the survey. 


Q1 Primary departmental affiliation (e.g., NTRES) 


Q2 Year in program 
First (1) 

Second (2) 

Third (3) 

Fourth (4) 

Fifth (5) 

Sixth (6) 

N/A or Other (7) 


Q3 How many class sessions did you attend? 
One (1) 

Two (2) 

Three (3) 

Four (4) 

Five (5) 

Six (6) 


Q4 How would you rate your knowledge/skills/ability in the following areas before taking 
this class? 
No competence (1), Little competence (2), Somewhat competent (3), Very competent (4), 
Not applicable (0) 

Describing data management and its importance and relevance to you 
Describing your research and data collection process to identify your data life 
cycle 
Recognizing the necessary components of a data management plan 
Describing and following best practices in structuring relational databases 
Visualizing data and creating graphs 
Evaluating data repositories to determine requirements and fitness for data deposit 
Documenting your data for yourself and others 
Creating a data management plan to manage and curate your own data 
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Q5 How would you rate your knowledge/skills/ability in the following areas after taking this 


class? 


No competence (1), Little competence (2), Somewhat competent (3), Very competent (4), 
Not applicable (0) 


Describing data management and its importance and relevance to you 

Describing your research and data collection process to identify your data life 
cycle 

Recognizing the necessary components of a data management plan 

Describing and following best practices in structuring relational databases 
Visualizing data and creating graphs 

Evaluating data repositories to determine requirements and fitness for data deposit 
Documenting your data for yourself and others 

Creating a data management plan to manage and curate your own data 
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INTRODUCTION 


This Data Information Literacy (DIL) team, 
one of two Purdue University teams in the 
Institute of Museum and Library Services 
(IMLS)—-funded project, partnered with soft- 
ware design teams involved with Engineer- 
ing Projects in Community Service (EPICS), 
a course for undergraduate students from a 
variety of disciplines. We primarily worked 
with the graduate teaching assistants (TAs) 
who graded undergraduate design submis- 
sions produced during the design cycle. The 
software teams created code-based data sets 
and supporting documentation in a variety of 
languages and platforms. The creation of code 
documentation was the primary DIL need of 
the software teams. 

To respond to these needs, the Purdue DIL 
team developed a rubric that provided guid- 
ance for students to create and TAs to evaluate 
the documentation. Our team created a series 
of suggested exercises for students that tied 
specific data management activities to phases 
of the engineering design cycle used by EPICS 
(Lima & Oakes, 2006). We then implemented 
an embedded librarian service within the soft- 
ware teams. We handed out the rubrics and 
suggested exercises, offered a skill-training ses- 
sion to further enrich the students’ knowledge, 
met with the TAs to help them understand the 
document, and then served as design reviewers 
(outside assessors) for the teams. 

To assess the intervention, we used the design 
notebooks created by individual team members 
to identify instances where the students dem- 
onstrated DIL objectives. We created a coding 
schema that standardized notebook analysis 
across teams. The assessment concluded that 
on the individual level, students did not ad- 
equately record their coding decisions or ar- 
ticulate the rationale behind these decisions. 


While students showed a range in skill level 
in personal mastery of DIL, widespread weak- 
ness was evident in the competencies of data 
management and organization, data curation 
and reuse, and data quality and documentation. 
The core of our program was the integration 
of librarians within a preexisting, highly struc- 
tured course. In the future, we plan to focus 
on implementing a role within the team that is 
responsible for ensuring that the documenta- 
tion is of sufficient quality that it can be easily 
understood and is complete enough to ensure 
continued development of the project. 


ENVIRONMENTAL SCAN OF DATA 
MANAGEMENT PRACTICES 
FOR SOFTWARE CODE 


Data curators and digital preservation experts 
are paying more attention to software code as it 
is not uncommon for code to be an important 
component of a data set or other electronic 
object (Matthews, Shaon, Bicarregui, & Jones, 
2010). If the data set is to be curated effec- 
tively, it logically follows that the accompany- 
ing code must be accounted for in all curation 
planning and activities. Managing and curat- 
ing software code as a component of a data 
set presents several challenges in addition to 
the ones that would otherwise be encountered 
in curating data. These challenges include the 
myriad of components and dependencies of 
code (such as externally focused documenta- 
tion, internal documentation, multiple ver- 
sions of iterative code created, and so forth), 
the practice of building on or incorporating 
code developed over time or from multiple au- 
thors, and the rapid pace of new technologies 
that are introduced and adopted by software 
code writers. Therefore, data sets that include 


software code may require additional planning 
and consideration. 

Although the literature on the curation of 
software code as a component of a data set 
specifically is relatively limited, there is a great 
deal of literature that touches on the 12 DIL 
competencies and software code more gener- 
ally. Data management and organization, and 
what we referred to in the DIL project as data 
quality and documentation in particular, have 
received a significant amount of attention. We 
focused our environmental scan on a subset of 
material that appeared most relevant to address 
the issues faced by EPICS. We also selected a 
range of materials that touched on each of the 
12 competencies in some way. The selected ma- 
terials in our review included scholarly articles, 
trade publications, reports, books, and websites 
to incorporate the perspectives of both academ- 
ics and professionals in the field. 

This environmental scan was helpful in in- 
forming our work in several ways. Code de- 
velopers have a reputation for sharing their 
work with others as a matter of practice. For 
example, the ideas of “open source” and “open 
access” are assumed to be a strong component 
of the culture of practice of developers, which 
was largely supported in our literature review 
(Crowston, Annabi, & Howison, 2003; Hal- 
loran & Scherlis, 2003). However, despite an 
ethos and willingness to share code, many de- 
velopers do not provide the documentation 
necessary for others to understand or make use 
of their code easily (Sojer & Henkel, 2010; 
von Krogh, Spaeth, & Haefliger, 2005). Fur- 
thermore, code comments or other descrip- 
tions are often absent, or do not reflect the 
intent of the coder sufficiently, making it dif- 
ficult if not impossible to understand the deci- 
sions made in developing the code (Marcus & 
Menzies, 2010; Menzies & Di Stefano, 2003). 
This is despite the availability of resources to 
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assist in the documenting process in software 
repositories and the availability of tools such as 
Doxygen (n.d.). Software coding is frequently 
a collaborative activity, particularly in the 
workplace, as coders will often be assigned to 
work on existing code as a part of a team whose 
membership will change as collaborators tran- 
sition in and out of a project. Documentation, 
description, and organization of code are all 
recognized as important activities for a soft- 
ware group, but they are often activities that 
are neglected (Lethbridge, Singer, & Forward, 
2003). Many researchers in the computer sci- 
ence field present these issues as research ques- 
tions to solve and suggest technology based 
solutions to address them (Bettenburg, Ad- 
ams, Hassan, & Smidt, 2010; Grechanik et 
al., 2010; Hasan, Stroulia, Barbosa, & Alalfi, 
2010). However, these proposed technology- 
based solutions are often more theoretical than 
applied in nature by design and therefore of 
limited practical value. 

The environmental scan led to several other 
observations and findings that informed our 
work with EPICS. We noted some related in- 
terests within the curation and software com- 
munities but found that they used different 
terminologies in expressing these interests. 
For example, the idea of “software traceabil- 
ity’ —or the practice of recording design deci- 
sions including the who, what, where, when, 
and why and explicitly connecting these deci- 
sions to the software for the purposes of quality 
assurance (Ali, Gueheneuc, & Antoniol, 2011; 
Bashir & Qadir, 2006)—has commonalities 
with the data curation idea of “provenance,” or 
tracking and accounting for actions and deci- 
sions made in curating a digital object (Bashir 
& Qadir, 2006). Traceability is a quality assur- 
ance process ensuring that design decisions are 
readily identified and accounted for over the 
course of developing the code. Provenance is 
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tracked to ensure the integrity of the existing 
object and to demonstrate compliance with the 
policies and practices of the repository. It is the 
difference between developing something and 
maintaining it. We also came across a school of 
thought that advocated for “literate program- 
ming” and “human readable code.” The essence 
of the argument was that rather than creating 
code to solely be machine readable, developers 
should create code with the deliberate intent of 
making it suitable for human reading as well 
(Knuth, 1984). An offshoot of this idea, “clean 
code,” was particularly useful in planning our 
educational programming (Martin, 2008). Fi- 
nally, the need to preserve software code seems 
to be catching on in the data curation field, 
though we did not observe this as much in the 
software literature, where there seems to be a 
“technology moves too fast” mentality (Chen, 
2001). One particularly useful resource in this 
area of preservation is the Software Sustainabil- 
ity Institute (http://www.software.ac.uk/), which 
provides services and resources to ensure that 
software used in research is available and sup- 
ported beyond its original life span. 


METHODOLOGY 


Our project partner was Engineering Projects in 
Community Service (EPICS), a service-learning 
center at Purdue University (https://engineering 
.purdue.edu/EPICS). EPICS is focused on 
teaching undergraduates engineering design 
concepts and skills by working with commu- 
nity service agencies to develop customized 
engineering solutions that address real-life 
needs. EPICS brings students from a variety of 
disciplines across the university and academic 
years to work together on a common project. 
Therefore EPICS capitalizes on the diversity of 
strengths that the participating students bring 


each semester, but also must manage the gaps 
in their knowledge and abilities. This is a highly 
transitory group of students, with project per- 
sonnel turning over each semester as projects 
continue till completion. One of the librarians 
on this project, Megan Sapp Nelson, worked 
with EPICS on previous projects and had de- 
veloped a strong understanding of their infor- 
mation needs generally, as well as their working 
culture. As an advisor to EPICS software teams 
for 4 years, she was familiar with the highly 
structured nature of the design course and had 
previously developed information literacy edu- 
cation interventions to improve the quality of 
the conceptual design performed in the projects 
(Sapp Nelson, 2009, 2013). From past experi- 
ences, she was aware that students had difficulty 
managing their software code and documenting 
their work, which presented problems for all in- 
volved, including future students coming into 
the project, faculty advisors and administrators 
in EPICS, and the community partners who 
will make use of the students’ projects. 

The DIL team interviewed four faculty and 
two graduate students in the spring of 2012 
(the instrument is available for download at 
http://dx.doi.org/10.5703/12882843 15510). 
To incorporate a broad perspective on manag- 
ing and curating software code, we interviewed 
individuals who were affiliated and unaffiliated 
with EPICS and who came from three disci- 
plines. Table 5.1 shows the affiliations of the 


interviewees. 
Results of the Needs Assessment 


Both the faculty and students rated each of the 
12 DIL competencies on a 5-point scale ac- 
cording to how important it was for graduate 
students to master the competency. ‘The rating 
results by our six participants are presented in 
Figure 5.1. 
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TABLE 5.1 Purdue DIL Team Interviewees by Department and Affiliation 


DIL Interviewee 


Academic Discipline 


EPICS Affiliation 


Faculty #1 Electrical engineering Affiliated 
Faculty #2 Engineering education Affiliated 
Faculty #3 Computer science Nonaffiliated 
Faculty #4 Computer science Nonaffiliated 


Graduate student #1 


Graduate student #2 


Among the top DIL competencies for the 
faculty we interviewed were data quality and 
documentation and metadata and data descrip- 
tion. It is interesting to note that faculty rated 
these two competencies much higher than the 
graduate students did, demonstrating a discon- 
nect between the attitudes and perceptions of 
faculty and students in these areas. Further- 
more, these two are highly rated within the 12 
competencies on average, despite students indi- 
cating that they place less importance on them. 
Faculty recognized data quality and documen- 
tation in developing software code as a weak- 
ness in students. While students frequently are 
instructed to document code development, 
their understanding of what this documenta- 
tion should consist of and the degree to which 
quality documentation is necessary are often 
misunderstood, which leads to high variability 
in their team’s performance and in the quality 
of the code. Faculty recognized metadata and 
data description as important. However, while 
faculty were aware of the need for metadata, 
they reported that they themselves did not have 
the understanding or skills to apply metadata 
nor to teach their students about it. 

Conversely, graduate students rated data 
conversion and interoperability and discovery and 


Electrical engineering 


Computer science 


Nonaffiliated 
Nonaffiliated 


acquisition higher in importance than the fac- 
ulty. For data conversion and interoperability, 
this is likely due to one faculty member stating 
that her lab did not engage in converting data, 
and another stating that this was not a skill that 
all students needed as long as they had access to 
someone knowledgeable in this area. Rather, the 
area of particular interest for both faculty and 
students within this competency was the pre- 
vention of data loss in the conversion process. 
For the discovery and acquisition competency, 
the faculty indicated that it may not always be 
crucial to the research being conducted. For ex- 
ample, their projects were not making extensive 
reuse of software code. However, the graduate 
students stated that they will search for existing 
code that performs similar functions to the code 
that they were generating, which may explain 
their rating of this competency as more impor- 
tant than the faculty’s. Interestingly, we found 
that the primary means of locating existing 
code for the graduate students and faculty we 
interviewed is a literature search of conference 
proceedings. A literature search is then followed 
by a Web search to find the project or author's 
website where the code may be available. 

On the basis of the interviews, our environ- 
mental scan, and our knowledge of EPICS, we 
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Figure 5.1 The average ratings of importance for each of the 12 data competencies for 


faculty and students interviewed. 


developed and built the educational interven- 
tion around the data quality and documentation 
and the metadata and data description compe- 
tencies. Our intended audiences were the grad- 
uate student TAs and their undergraduate team 
members in the EPICS program. 


OVERVIEW OF THE 
EPICS ENVIRONMENT 


The EPICS curriculum develops engineering 
design and professional skills in an environ- 
ment intended to be a bridge to the students’ 
professional careers. EPICS is a highly struc- 
tured and intense environment as students 
must take on a fair amount of work in new 
and unfamiliar areas and are held to high stan- 
dards of professionalism by their instructors. 


This environment requires students to take 
initiative in developing their assigned projects 
independently but with the knowledge that 
their instructors will evaluate their work and 
performance. Consequently, students receive 
rubrics that will be used for evaluations so that 
they better understand what is expected of 
them. Students also learn the design life cycle, 
a framework for developing and executing their 
projects (Lima & Oakes, 2006). Students map 
their work to the stages of the design life cycle 
as they progress through the course. The work 
is performed in teams, and within each team 
students assume particular roles, such as team 
leader or as primary contact for the project 
partner (see Table 5.2). 

EPICS uses a number of different approaches 
to develop these skills. Typically, at the begin- 
ning of the semester, EPICS holds introductory 
lectures for students that include distribution of 
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TABLE 5.2 Defined Team Roles in the EPICS Curriculum 


Faculty, Graduate, 


Role Responsibility or Undergraduate 
(F/G/U) 
Team leader Team member responsible for overseeing all projects U 
conducted by team in a given semester 
Project leader/ Team member responsible for overseeing work on a single U 
manager project for a given semester 
Project partner Team member responsible for initiating and maintaining U 
liaison communication with community partner 
‘Advisor Faculty member assigned to oversee the student team for a F 
given semester 
Graduate teaching Graduate student responsible for providing resources, G 


assistant 


the rubrics that will evaluate their performance. 
Next, students participate in a series of skill ses- 
sions to teach them some of the fundamentals 
they will need to know to be successful, such as 
programming languages, team building skills, 
and appropriate use of laboratory resources. 
All students meet for weekly lab sessions dur- 
ing the semester, where they discuss their prog- 
ress and the challenges they have encountered 
while working with their team. As the semester 
progresses, students present their work in two 
separate design review sessions, which often in- 
clude a representative from the project partner 
organization and professional engineers. There, 
students receive feedback and suggestions on 
their work and the quality of their presentations. 

In EPICS, students are expected to produce 
documentation that describes their own work 
as well as the decisions and actions taken by 
the team to accompany their coding files. Stu- 
dents organize their data sets using multiple 
techniques. The primary sources of project- 
level documentation are the design notebooks 
or blogs required for completion of the EPICS 


holding team accountable, and grading 


class. Students store their notebooks in a physi- 
cal location near the lab meeting place or on a 
server in their digital form. The internal proj- 
ect management documents and the external 
or user documentation are in a variety of Mi- 
crosoft Office files and are located on a server, 
wikis, or Subversion (SVN). Teams manage 
and store the code itself using SVN. They write 
code using software languages such as C++ and 
JavaScript as well as utilizing the Android and 
Apple mobile platform development tools. De- 
pending upon the team, there may be several 
software code data sets under development at 
any given time. 

Within the EPICS environment, it is very 
important to be able to share code both within 
a team and outside of it. As projects typically 
span multiple semesters, students will transi- 
tion in and out of the team over the life of a 
project. As such, a need within EPICS is that 
the resulting code and code structure be read- 
ily apparent, logical, and “human readable” to 
facilitate the transition between developers on 
each project. Another consideration is that the 
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software code has real-world application out- 
side of the educational realm. The code is de- 
signed for practical use by nonprofit agencies 
in the local community. It is therefore very im- 
portant that the code be designed and delivered 
in ways that support its ongoing use and main- 
tenance over time. More information about 
EPICS can be found on its website (https:// 
engineering.purdue.edu/EPICS). 

The challenge for the DIL team involved 
supporting the development of useful software 
code products, which was a complex endeavor 
made more complicated by the high rate of 
turnover among team members between semes- 
ters. TAs are asked to hold their undergradu- 
ate student team members accountable for the 
quality of their code during the grading process. 
However, it was evident from the interviews 
that the TAs did not have the experience, com- 
fort level, or tools to grade the quality of the 
code and the documentation that the students 
were submitting, and ultimately they had dif- 
ficulty holding the team members accountable. 

EPICS as a whole did not have a cohesive, 
clearly articulated culture of practice regarding 
the management and documentation of code. 
Some teams agreed to naming conventions for 
files and variables or developed other “local” 
standards, but this was left up to the individual 
teams to decide. Generally, the code writers 
looked to more experienced teammates to pro- 
vide them with standards, rather than develop- 
ing standards among the group by consensus. 
A few faculty advisors provided expectations 
for code documentation, but it was not a stan- 
dard across EPICS and happened infrequently. 

A variety of development tools were used as 
needed by individual teams that supported cre- 
ating documentation for code, such as JavaDocs 
(http://www.oracle.com/technetwork/java 
/javase/documentation/index-jsp-135444.html) 
and Yii (http://www.yiiframework.com/). TAs 


supervised more than one team, which meant 
that the TAs had to familiarize themselves with 
the tools that each team was using. On some of 
the teams new students went through multiple 
weeks of training to teach them how to use the 
tools as well as introductory coding skills. TAs 
provided guidance during this process and one- 
on-one instruction for student coders who were 
having difficulty. 

Faculty advisors generally agreed that the 
level of oversight for student coding projects 
was insufficient. The TAs indicated that part 
of the difficulty in providing oversight was a 
subjective measure of quality for the coding. 
Although EPICS faculty and TAs raised docu- 
mentation, organization, and transferability of 
the software code as serious issues, they had not 
yet developed supporting materials or strong 
cultures of practice in these areas within EPICS. 
Therefore the DIL team saw an opportunity to 
support the work of the TAs, who in turn sup- 
ported the education of undergraduates in the 
EPICS program, through developing resources 
and providing a framework for good software 
code documentation practices. 


AN EMBEDDED LIBRARIAN APPROACH 
TO ADDRESSING DATA INFORMATION 
LITERACY NEEDS 


The DIL team developed goals and learning 
objectives for educational programs based on 
the results of the interviews, environmental 
scans, and previous knowledge of EPICS. They 
had three overarching goals: 


1. To raise the students’ awareness of the 
need to generate quality documentation 
and description of the software code they 
generated 
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TABLE 5.3 Learning Objectives for Students and Teaching Assistants in EPICS 


Target Audience Learning Objectives 

Undergraduate 
students who are 
a part of software 
development 
EPICS teams will: 


Recognize that documentation and description are integral components of 
developing software code (and are not simply “busy work”) in order to hold 
oneself and team members accountable for producing quality documentation and 
description in a timely manner 

Document own code and methods in developing the code in ways that enable the 


reproduction of work by others in order to ensure the smooth transfer of work to 
other students and the EPICS project partner 

Create and communicate standard operating procedures for managing, organizing, 
and documenting code and project work within the team in order to develop 
consistent practice and to facilitate clear communication amongst team members 


Teaching assistants 
who lead software 
development 


EPICS teams will: negative data practices 


Identify characteristics of well-written software documentation in order to recognize 
well-written project and software documentation 
Evaluate project and software documentation in order to identify both positive and 


Critique project and software documentation in order to assess quality and assign 


grades 


2. To provide students and graduate TAs 
with the knowledge and tools to generate 
quality documentation and description 
for software code 

3. To develop a shared cultural practice in 
EPICS based on disciplinary values in 
data management issues, particularly is- 
sues in quality, documentation, and the 
description of data and software code 


Table 5.3 lists the specific learning objectives 
for the two target audiences. 

Given the structured nature of EPICS 
and the intensity of the work, the DIL team 
found that the students had little time for 
“additional” learning activities or events. So 
we decided to take an “embedded librarian” 
approach to developing and delivering a DIL 
educational program that connected with the 
EPICS structure and culture. Embedded li- 
brarianship can be defined as the process of 
presenting information literacy content as a 


part of course curricula in ways that are di- 
rectly relevant to student outcomes for the 
course (Schulte, 2012). Embedded librarian- 
ship is a particularly promising method for 
implementing information literacy instruc- 
tion due to the presentation of information 
literacy competencies in an immediately rel- 
evant manner (Tumbleson & Burke, 2010). 
Given the project-based nature of the course, 
an embedded librarianship approach appeared 
to best integrate with the course design and 
content that already existed within the EPICS 
program. 

To implement our embedded librarian ap- 
proach, in the fall of 2012 we focused on three 
groups within EPICS. Each of these groups had 
at least one faculty advisor, a graduate student 
TA, and multiple teams of students that each 
worked on a particular project. Our approach 
for implementing our educational program- 
ming was to forge connections with the faculty 
advisors, graduate TAs, and students in EPICS 
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by taking advantage of built-in opportunities 
to interact with each group. This included 


e developing an evaluation rubric for TAs 
to apply to student work; 

e offering a skills-based session on docu- 
menting code and project work; 

e attending lab sessions and observing team 
meetings; 

e participating as reviewers in the students’ 
design review sessions. 


To create this educational program, we first 
returned to the literature review, particularly the 
sources that described criteria for developing 
“clean code,” to identify relevant best practices 
and documentation guidance for software devel- 
opers. Next, using the existing rubrics developed 
by EPICS as a guide, we crafted two rubrics 
(Appendix A to this chapter) that the graduate 
TAs could use to evaluate both the code and 
the documentation created by their students. 
We also distributed a one-page document (Ap- 
pendix B to this chapter) to team leaders that 
explained the expectations for quality code and 
described why documentation of code is impor- 
tant. Finally, we shared our work with the TAs 
and made some adjustments based on their feed- 
back. Table 5.4 shows the full schedule. 

We held the skills session on documenting 
and organizing code during the third week of 
the semester. The focus was on helping the team 
leaders in EPICS recognize what constituted 
quality, professional practice in documenting 
and organizing code, and the need for students 
to internalize these practices. The session com- 
prised three modules (see the complete lesson 
plan in Appendix C to this chapter). In the first 
module we presented quotes from articles writ- 
ten by several prominent coders that described 
the attributes of “clean code.” We then dis- 
tributed three examples of code that had been 


generated by previous EPICS teams. We asked 
the class to identify the strengths and weaknesses 
of the code from the perspective of documen- 
tation and organization. We closed this mod- 
ule with a discussion of what constitutes good 
code versus poor code. In the next module we 
discussed why writing well-documented and 
well-organized code matters. We emphasized 
that writing software code is inherently a col- 
laborative activity as the majority of code will 
be used by others, both as a product and also 
as something edited and maintained by other 
coders (future EPICS students in this case). We 
then introduced a coding skills inventory (see 
Table C.1 in Appendix C to this chapter), a list 
of 12 skills to facilitate good coding habits in 
EPICS teams. In the last module, the team lead- 
ers picked one of the skills on the coding skills 
inventory list that they saw as a high priority for 
their team and designed a short learning activity 
that would address this skill. We provided the 
team leaders with activities that could support 
such an intervention (see the list in Appendix D 
to this chapter). We recognized that the teams 
were at different stages in the software develop- 
ment process, so we mapped our list of activities 
to the stages of the design life cycle to facilitate 
this process. Finally, each team leader shared a 
selected skill and activity with the group and de- 
fined the measure of success for the activity. 

Unfortunately the skills session was volun- 
tary and there was a poor turnout. While all 
team leaders and project leaders were invited, 
only five students attended from four teams. 
We found that this introduction to DIL skills 
was not pervasive enough to introduce and in- 
still a foundation of good practice. 

As the semester progressed we made frequent 
visits to the EPICS labs. Early in the semester 
we attended a lab for each of the three teams we 
were working with and introduced ourselves to 
the students. We distributed the documentation 
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TABLE 5.4 Embedded Librarian Engagement Activities 


Semester Timeslot Activity 

Week 2 Introduction 

Week 3 Voluntary skills session 
on documenting and 
organizing code 

Weeks 4-6 Embedded librarianship 

Week 7 Design review #1 

Weeks 8-13 Embedded librarianship 

Week 14 Design review #2 


Description 


Initial visit to the EPICS weekly lab session to introduce 
the DIL team and distribute rubric materials to all 
students 


This session was offered to team leaders in EPICS and 
covered the following: 

Module 1 —What is good coding? 

Module 2—Why is it important? —EPICs as a 
stakeholder—appeal for coding may not be as 
relevant—professional—take a poll 

Module 3—How to foster good coding practices in 
your team 


Observations and consultations in weekly lab sessions 


First round of feedback and suggestions for student 
work in documenting their code and their projects 


Observations and consultations in weekly lab sessions 


Second round of feedback and suggestions for student 
work in documenting their code and their projects 


Post-semester Assessment 


rubric that we had developed. Subsequently, we 
each attended multiple lab sessions for each of 
the three groups over the course of the semes- 
ter. These interactions gave us the opportunity 
to observe how students were developing their 
work and to interact with them (though in a 
limited fashion as lab sessions covered many as- 
pects not related to the DIL project). We also 
attended both of the design reviews (7 weeks 
and 14 weeks into the semester) and were able 
to provide some suggestions for their work in 
documenting their code and their projects. 

Our approach in assessing this work has 
been twofold. First, we met individually with 
two of the three TAs for the teams (the third 
was unavailable) and two of the faculty advisors 
at the end of the fall 2012 semester. We asked 
about any changes in student behavior they 


Collected and reviewed student lab notebooks 


observed, changes in their perceptions of these 
topics, and possible next steps for our work with 
EPICS. Although the feedback we received was 
generally positive, no one reported a substantial 
change in student activities in writing code and 
documenting their work. They encouraged the 
DIL team to keep working with EPICS, and as 
a result of these conversations, developed some 
ideas for the future as described in the “Dis- 
cussion” section. Second, we reviewed the lab 
notebooks that students in one of the groups 
we had worked with had written during the 
fall semester. The DIL team developed a cod- 
ing schema to evaluate student knowledge and 
skills in documenting their work effectively. 
This analysis will enable us to better pinpoint 
areas of need and will inform our work in de- 
veloping more targeted responses. 


112 PARTII Data Information Literacy Disciplinary Case Studies 


DISCUSSION 


The opportunity to embed within a highly 
structured, multiple section class provided this 
Purdue DIL team a broad range of insights for 
actionable next steps, future research, and rec- 
ommendations to the EPICS leadership team. 

First, we identified that the team leader and 
project leader roles are key to the dissemina- 
tion of good data management planning and 
practice within any given team. We identified 
this early through interviews and attempted to 
address this via a one-shot skill session aimed 
at the student project and team leaders. Given 
the low level of turnout 
and lack of observed 
knowledge/skill trans- 


The current approach 
of having students 
share the responsibility fer from the session, 
of documentation and we needed to develop 
description instead of a more embedded ap- 
designating a member proach to data manage- 
of the team to have ment skills building. 
Another differen- 


tiating aspect of the 


direct ownership of 
these tasks is a major 
cause of the low-quality EPICS environment is 
documentation and 


difficulties in the 


transfer of work. 


the assignment of spe- 
cific roles to students 
within their groups. 
Teams in EPICS select 
their project and team leaders early in the se- 
mester, along with more specific roles such as 
the webmasters, project partner liaisons, and 
financial officers, among others. Despite the 
near ubiquity of teams encountering issues 
with the documentation done by previous stu- 
dents, teams do not acknowledge this issue in 
their meetings or do much to address it for- 
mally. A defined role for a student member of 
a team might ensure that code documentation 
and description of the project were carried out 
efficiently and in ways that ensured a smooth 


transition from semester to semester, as well as 
from EPICS to the community agency when 
the project is done. The current approach of 
having students share the responsibility of 
documentation and description instead of des- 
ignating a member of the team to have direct 
ownership of these tasks is a major cause of the 
low-quality documentation and difficulties in 
the transfer of work. 

Therefore, the DIL team proposed a pilot 
project for the fall of 2013 to define and imple- 
ment a project archivist role within selected 
EPICS teams. The purpose is to integrate fully 
the oversight of documentation formally within 
the team structure by creating a specific team 
role. We envision the project archivist’s role as 
taking a big picture approach toward capturing 
the description and documentation of the proj- 
ect, including the design constraints, decision- 
making processes, and design implementations 
for each team. As a result, the EPICS teams 
might see smoother transitions of the project 
to future team members, graduate teaching as- 
sistants, faculty advisors, EPICS administrators 
and project partners. We will be working with 
a continuing lecturer and an EPICS advisor to 
further define, implement, and assess the impact 
of the project archivist role. 

Second, while the rubrics for evaluating 
software code and documentation that we 
developed are a good start, there is a need for 
further curricular development to integrate the 
rubric into the EPICS workflow for the semes- 
ter. A high priority will be to address the indi- 
vidual and team documentation templates used 
by EPICS. Currently, these templates do not 
highlight the need for excellent coding prac- 
tices and data management. Working with the 
EPICS administrative team, we hope to create 
a template or other workflow that highlights 
the need for well-designed and well-written 


code while providing a structure for individual 
and team-level accountability. These resources 
will support the TA’s role as a mentor to EPICS 
students, using a train-the-trainer approach. 

Another need that the DIL team identified 
was a central reference solution that enables 
students (both undergraduate and graduate) 
to learn needed data skills at their point of 
need, while working either independently or 
in a laboratory setting. We feel that a library 
of short videos (perhaps hosted on a YouTube 
channel) that covers software and data manage- 
ment topics would be highly useful to EPICS. 
The EPICS curriculum is built around the idea 
of working independently to write code that is 
then brought back to the group for further de- 
velopment. It is important that students have 
instruction on clean coding, creating excellent 
documentation, and project management plan- 
ning that is available to them outside of class. 
Similarly, graduate students frequently work 
independently, submitting code to their su- 
pervisor for comment and review. A YouTube 
library would create a ready reference for those 
needs that arise while the students are practic- 
ing or expanding their skill sets. 

Finally, we noted that the depth and qual- 
ity of project documentation and reflection 
captured in the team members’ lab notebooks 
varied widely. The highest order of learning 
skills according to Bloom’s taxonomy (Bloom, 
1956)— evaluation and analysis—were not of- 
ten present within the EPICS notebooks, even 
as the students were engaging in a creative pro- 
cess. Evaluation and analysis are at the heart of 
excellent data management skills; by looking 
at the long-term life span of the project, stu- 
dents identified the immediate worth of clean 
code not only for themselves but also for fu- 
ture EPICS team members, project partners, 
clients, and users. Working with the EPICS 
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administrators, we hope to emphasize the re- 
flective practice of code writing, particularly for 
software and hardware engineering disciplines. 


CONCLUSION 


This approach toward developing and imple- 
menting a DIL educational program was to 
embed into the structure and environment of 
EPICS. Embedded librarianship was a natural 
choice given the highly structured nature of the 
EPICS program and engineering disciplines. 
This approach allowed us to reach a relatively 
large number of students (40 approximately) in 
ways that aligned with their current practices. 
However, employing an embedded librarian 
approach in our program took a great deal of 
planning and investment for the DIL team to 
set up and carry out. 

Several interrelated factors should be ad- 
dressed in this type of DIL model. First, the 
embedded librarian approach requires that 
librarians build solid relationships with the 
people running the program. When a librarian 
is embedded in a course, this may include just 
the faculty instructor and his or her teaching 
assistant. We decided to partner with a service- 
learning center and to focus our efforts on three 
groups and their graduate student TAs oversee- 
ing the work of multiple teams of students. This 
structure required us to build connections with 
the faculty advisors, the graduate student TAs, 
the EPICS administration, the student team 
leaders, and others. Sapp Nelson’s prior experi- 
ence aided our relationships in working with 
EPICS, as did Carlson’s previous interactions 
with one of the faculty advisors. Nevertheless, 
our approach still required multiple meetings 
to introduce ourselves, explain what we were 
trying to do, and establish contact with a great 
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number of people. We recommend that librar- 
ians who wish to launch a DIL program plan 
to cultivate and maintain relationships as a part 
of their program development. 

Second, we worked hard to align our efforts to 
fit into the structure of our partner. EPICS has a 
very structured. way of doing things that did not 
allow for a great deal of deviation. Therefore, we 
had to identify these structures early on and then 
determine how best to in- 
tegrate ourselves to reach 


Learning the context ; : 
as students in meaningful 
and gaining an 

ways. We took advantage 
understanding of 


of opportunities to reach 
the setting were as 


l students, such as holding 
ımportant to our 


, a voluntary skill session 
program as defining ; 

early in the semester and 
our terms. This 


attending design reviews 
was very much an 


eens at the midpoint and end 
of the semester. However, 
we also had to create ad- 
ditional ways of connecting with students within 
the EPICS structure. Our approach was to align 
our instruction and interactions as best we could 
with current practices. We did this by creating 
a rubric for evaluating student documentation 
and organization practices and making ourselves 
available during some lab sessions. 

Third, the embedded librarian approach re- 
quired a fairly significant time commitment. 
In addition to the time that we invested in 
identifying which of the DIL competencies to 
address and in developing the knowledge to 
design an educational program to respond, the 
DIL team put in many hours attending lab ses- 
sions and design reviews, offering the skill ses- 
sion, developing resources, and meeting with 
faculty advisors and TAs affiliated with EPICS. 
We believe that the in-person contact was 
worth the effort as it definitely helped make an 
impact, forge relationships, and better under- 
stand the EPICS environment. However, it was 


occasionally difficult to find the time to devote 
to making these personal appearances given 
our other responsibilities and because we fol- 
lowed EPICS’s schedule rather than our own. 
The time commitment continues as we review 
the content of team lab notebooks to better de- 
termine the impact the DIL program had on 
students and to observe where their DIL com- 
petencies strengths and weaknesses lie. Here 
too, we believe that the time commitment in 
assessing student work will pay off as we con- 
tinue to develop our partnership with EPICS. 
Beyond the lessons learned from develop- 
ing the program itself, we gained a better un- 
derstanding of the 12 DIL competencies from 
the interviews. We decided to focus on only 2 
of the 12 competencies for our work with EP- 
ICS on the basis of its needs and our ability to 
respond to those needs. However, the needs ex- 
pressed were many and may provide additional 
opportunities for follow up. In particular both 
the faculty and the students we interviewed in- 
dicated that competency with data visualization 
and representation was important. In addition to 
the breadth of needs expressed in the interviews, 
we observed wide variations in baseline skills of 
students working with EPICS. For this project, 
we deliberately kept the definitions of the com- 
petencies loose, as we wanted interviewees to 
express their opinions and perspectives on the 
competencies with little direction or interference 
from us. For our work with EPICS on data qual- 
ity and documentation, it was clear that its success 
is very much specifically oriented on a particular 
skill in that competency: “Documents data suff- 
ciently enough to enable the reproduction of the 
research results and the data by others.” How- 
ever, we needed to define what this statement 
really meant for EPICS and how it was (or was 
not) understood by the students, TAs, faculty 
advisors, and EPICS administration to be able 
to respond effectively. Learning the context and 


gaining an understanding of the setting were as 
important to our program as defining our terms. 
This was very much an iterative process. 


NOTE 


This case study is available online at http:// 
dx.doi.org.10.5703/12882843 15477. 
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APPENDIX A: Rubrics for Evaluating Software Code and Documentation 
Produced in the Purdue University EPICS Program 


EPICS SOFTWARE CODE RUBRIC 


Outcome Expected Acceptable Unacceptable 
The code The code produces the The code produces the Code fails to perform 
performs as desired performance in desired performance, 
intended a timely, straightforward, but includes elements 
consistent, concise, that add to processing 
simple, and logical time, adds unnecessary 
manner without complexity, unnecessarily 
extraneous elements lengthens the code, 
contains inconsistencies, 
or obscures logic 
The code The code itself and The code and The code and 
is human corresponding documentation are easily documentation are 
readable documentation are understood in most generally not easily 


easily understood by 
a person with a basic 
understanding of 

the coding language 
used. The code is 
straightforward and 
intuitive 


places. In some places 
coding is convoluted 
or wordier than is 
necessary. 


The code contains Meaningful names—names Meaningful names are 


understandable. The code 
includes esoteric coding 
strategies 


Meaningful names are 


meaningful that clearly convey used in most of the facets used only occasionally 
names the distinct purpose, of the code, including in the facets of the code, 
behavior, or intent of variables, procedures, including variables, 
a particular element of functions, classes, and procedures, functions, 
the code—are used in objects classes, and objects 
all facets of the code, 
including variables, 
procedures, functions, 
classes, and objects. 
Names should be 
distinct, descriptive, 
non-redundant, and 
technically correct. 
The code is The code follows The code generally follows The code occasionally 
consistent standardized rules, standardized rules, or rarely follows 


conventions, or a 
consistent logical pattern 
in its structure, format, 
and use of names 


conventions, or consistent 
logical patterns in its 
structure, format, and use 
of names 


standardized rules, 
conventions, or consistent 
logical patterns in its 
structure, format, and use 
of names 


Continued 


EPICS SOFTWARE CODE RUBRIC —cont'd 


Outcome 


The structure 
and layout of 
the code is 
appropriate 
and logical 


The code makes 
appropriate use 
of comments 


Expected 


The structure and layout 
of the code consistently 
follows a logical order 
that conveys meaning to 
the reader. Variables are 
always placed in close 
proximity to their use 


Comments in the code 
are consistently clear, 
concise, and informative 
and consistently explain 
intent or assumptions 
made by the author 
as needed. Comments 
in the code always 
contain appropriate 
information about 
the code and do not 
duplicate other sources 
of documentation. 
Comments appear as 
close to the part of the 
code they refer to as 
possible. Comments 
reflect the current state of 
the code and have been 
updated when the code 
was updated 


Acceptable 


The structure and layout 
of the code generally 
follows a logical order 
that conveys meaning 
to the reader. Variables 
are often placed in close 
proximity to their use. 


Comments in the code are 
usually clear, concise, 
informative, and sufficient 
in explaining intent or 
assumptions made by the 
author. Comments in the 
code generally contain 
appropriate information 
about the code, but may 
duplicate other sources 
of documentation. 
Comments often appear 
close to the part of 
the code they refer to. 
Comments generally 
reflect the current state of 
the code 


Unacceptable 


The structure and layout 
of the code does not 
follow a logical order that 
conveys meaning to the 
reader. Variables are not 
placed in close proximity 
to their use 


Comments in the code 
are occasionally clear, 
concise, and informative 
and do not really explain 
intent or assumptions 
made by the author. 
Comments in the code 
do not generally contain 
appropriate information 
about the code, or often 
duplicate other sources 
of documentation. 
Comments do not appear 
close to the part of 
the code they refer to. 
Comments are outdated 
and do not reflect the 
current state of the code 


EPICS SOFTWARE DOCUMENTATION RUBRIC 


Outcome 


Documentation 
describes 
functionality of 
code 


Documentation 
describes the 
composition of 
the software 
package 


Expected 


The documentation 
includes clear information 
on the purpose of the 
code and what the code 
does 


All constituent parts 
of the software code 
and accompanying 
hardware/documentation 
are identified, and 
interrelationships between 
the parts are clearly 


identified 


Acceptable 


The documentation only 
partially describes the 
purpose and functionality 
of the code, and is 
at times unclear to 
non-fteam members. 
Documentation omits 
minor details 


Most constituent parts 
of the software code 
and accompanying 
hardware/documentation 
are identified, and most 
interrelationships between 
the parts are clearly 
identified. Documentation 
omits minor details 


Unacceptable 


The documentation fails 
to describe the purpose 
and functionality of 
the code in a way that 
is understandable to 
non-team members. 
Documentation omits 
multiple and/or 
important details 


Significant numbers of 
constituent parts are not 
identified and/or the 
interrelationships are not 
adequately identified 
to enable non-team 
members to orient 
themselves to the software 
package. Documentation 
omits multiple and/or 
important details 


Continued 


EPICS SOFTWARE DOCUMENTATION RUBRIC —cont' d 


Outcome 


Documentation 
accounts for 
significant 
changes and 
tracks versions 
of the code 


Documentation 
describes 
human-centered 
design decisions 


Documentation 
describes 


the software 


environment 


Documentation 
describes 
software 
architecture 
decisions 


Documentation 
is updated in 
a timely and 
consistent 
manner 


Expected 


Changes are tracked via 
a versioning system; 
ownership of changes 
is clearly noted in the 
documentation so that 
future team members 
can retrieve background 
information from blogs/ 
notebooks 


All interfaces and human 
interaction points 
with the software are 
documented, along with 
the software designer's 
decision-making process 
for those interactions 


All hardware, operating 
systems, programming 
languages, compilers, 
software libraries, other 
software packages, and 
peripherals necessary for 
using/understanding the 
code have been identified 


In the case of databases 
and apps, decisions 
on the architecture 
(backups, client/server 
operations, or peer-to- 
peer interactions) have 
been fully explained in 
the documentation 


Documentation is up to 
date for all changes to 
that point in the semester. 
Documentation is digitally 
signed by the code 
author and dated 


Acceptable 


The majority of changes 
are tracked; ownership 
of changes is not always 


identified 


Interfaces are identified, 
but it is not clear to future 
team members why or 
how those decisions were 
made. Documentation 
omits minor details 


There have been oversights 
in listing component 
systems and languages. 
Documentation omits 
minor details 


The system architecture is 
understood, but requires 
research in the code 
in order to identify the 
underlying architecture 
fully. Documentation 
omits minor details 


Documentation is 
somewhat behind the 
current state of the code. 
Documentation is digitally 
signed by the code 
author and dated 


Unacceptable 


Changes aren’t recorded 
and responsibility for 
those changes is not 
recorded consistently. 
Documentation omits 
multiple and/or 
important details 


Interfaces are not 
identified. Neither are 
decisions identified. 
Documentation omits 
multiple and/or 
important details. The 
future team member must 
make assumptions or 
create a backstory for 
code decisions made 


Little or no attempt has 
been made to collocate 
the components the 
software design and 
programming relies 
upon. Documentation 
omits multiple and/or 
important details 


The documentation 
does not have enough 
information for the 
underlying architecture 
of the system to be 
easily understood and 
the code does not make 
it clear. Documentation 
omits multiple and/or 
important details 


Documentation is 
completely out of date 
and/or has not been 
signed and dated by the 
code author 
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APPENDIX B: One-Page Handout for EPICS Graduate TAs 
Introducing the Project 


Software Coding and Documentation Practices for EPICS 


While researching the coding and documentation practices of electrical and computer engineer- 
ing and computer science programmers, we noticed consistent gaps in the coding and documen- 
tation practices of previous EPICS teams. With feedback from advisors and TAs, we developed 
rubrics to hold software creators and their teams accountable for producing quality software and 
documentation. 


Why is this important? 

As you know, EPICS teams continue from semester to semester. It is very challenging for new 
team members to join a project team when the documentation, comments, and quality of the code 
produced by previous teams are difficult or impossible to understand. Existing team members may 
or may not be available to explain how design decisions were made. Regardless, poor coding and 
documentation practices needlessly slow down the project and require that the delivery be pushed 
back as teams second guess or are forced to recreate decisions. Most professional positions will be 
situated in a team environment and will develop components of software code across multiple 
teams. Therefore, gaining an understanding of good documentation skills and being able to dem- 
onstrate these skills in one’s code are critical skills for EPICs students. 


What are the expectations for quality code? 
The following questions help define expectations for quality code: 


e Does the code work as it was designed? 

e Can anon-—team member easily understand what the code is doing? 

e Are the names chosen in the code meaningful to an outside code reader or user? 

e Is the code internally consistent in naming and other conventions? 

e Do the structure and layout of the code assist a non—team member in understanding the 
code? 

e Do the comments assist a non—team member in reading and understanding the code? 

e Does the documentation identify all major decision points, relationships, components, 
and operational features of the code? 


Is this extra work? 

Yes and no. This is a grading structure for existing deliverables, including the design documenta- 
tion and individual blogs/notebooks. Your team might already be doing these things on a routine 
basis. However, if you have not been coding with these criteria in mind, then you will need to take 
some action to ensure that they are included in your workflow. 


Do you have questions? Contact Megan [Megan’s e-mail] or Jake [Jake’s e-mail]. 
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APPENDIX C: Skills Session Lesson Plan for Week 3 of the 
EPICS Data Information Literacy Intervention 


Organizing, Managing, and Documenting Software Code 


Objectives 
Recognize desired coding habits of software professors and companies to internalize the need for 
good code habits. 
1. Articulate observable differences between poor coding product and good human readable 
code product to identify practices that support or denigrate code products. 
2. Identify ways that good coding facilitates quality outcomes and facilitates the completion 
of the EPICS project to build a case for developing strong coding habits. 
3. Determine your expectations and define what resources and tools your team will need to 
carry out these expectations to foster a work environment that supports a culture of good 
coding practices. 


Module Themes 


The skills session will be presented in three modules: 


e Module 1—What is good coding? 

e Module 2—Why is good coding important?—EPICS as a stakeholder—appeal for coding 
may not be as relevant—professional—take a poll. 

e Module 3—How to foster good coding practices in your team. 


Curriculum Methods 

Module 1 

Lecture—Goal 1: Introduction. 

Visual—Goal 1: Convey desired coding habits quotes from interviews. 

Needed resources: Computer; projector; PowerPoint; quotes regarding coding habits. 
Quotes from Clean Code book on what clean code is (in page number order).” 


e Grady Booch, page 8 

e “Big” Dave Thomas, page 9 
e Ward Cunningham, page 11 
e Bjarne Stroustrup, page 7 


Hands-on lab—Goal 1: compare examples of code snippets generated from previous EPICS teams 
as groups of two to three working together with a TA. 


* See Martin, R. C. (2008). Clean code: A handbook of agile software crafismanship (Vol. 1, p. 464). Retrieved from 
http://dl.acm.org/citation.cfm?id=1388398 
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Needed resources: Examples of code snippets in print or electronic form. 

Group report back—Goal 1: Using the examples, identify as a group what the differences are be- 
tween poor and good code. 

Needed resources: Whiteboard and dry erase markers, eraser. 


Module 2 

Group reflection/discussion—Goal 2: Why does writing good code matter? How does it support 
quality outcomes? (Responding to the question: If the code does what is needed, why does it 
matter?) 


e Code will be used by more than just its author(s). 
e Code needs to have a shelf life beyond its immediate purpose/context (your grade). 


Needed resources: Whiteboard and dry erase markers, eraser. 

Coding Skills Inventory Worksheet—Goal 2: What skills are needed to facilitate good code habits 
among your teams? Review and discuss briefly (and if necessary, solicit additional skills to add 
to this list). 

Needed resources: Worksheet (below) composed of a list of potential skills that students need to have. 


CODING SKILLS INVENTORY WORKSHEET FOR MODULE 2 


Coding Skills Inventory 


Evaluating code quality Establishing and following team standards in 
Establishing norms and consistencies in the structure developing documentation 
and organization of the code Developing code and documentation that can be 
Version control/tracking changes/synchronization easily understood and used by project partner and 
of the code (group editing) stakeholders 
Transferring/inheriting code to and from other Processes and structure for managing and 
project teams (maintaining continuity and avoiding maintaining the code and documentation as it is 
loss of knowledge) being developed 
Ensuring the sustainability of the code Review and testing the code to ensure quality and 
Ensuring that documentation is updated in a timely usability 
manner and accurately reflects the current state of | Understanding/documenting the relationship 
the code between project components (e.g., code, 
documentation) and the software environment as 
a whole 


Documentation of the decisions/actions/processes 
taken by teams 
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Module 3 

Hands-on lab— Goal 3: As a team of two to three people, pick one of the skills listed on the cod- 
ing skills inventory worksheet as a high-priority need for your team. Using the list of activities 
provided as a starting point, plan a short (5—10 min.) intervention for your team members that 
will address this need. 

Needed resources: Coding skills inventory worksheets; activities handout. 

Small team reflection/discussion—Goal 3: How will you know if your intervention has worked? How 
will you know if your team members “got it”? What will you be able to observe that shows that 
the code quality is improving? 

Group reflection/discussion—Goals 3 and 4: What intervention did you plan? How will you know 
that it is successful? 

Needed resources: Computer; Microsoft Word. 


[Instructor notes: Create a transcript of proposed ideas and assessments. Distribute to all partici- 
pants/keep as a part of the assessment of the skill session.] 
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APPENDIX D: Design Process-Centered Activities 
for Documentation and Organization 


We have developed ideas for possible activities that would help students gain an understanding of 
documentation and organization practices for software code and for projects more generally. These 
activities are organized according to the phases in the design cycle as depicted in “The Engineering 


39 ** 


Design Method for Service-Learning.” Each stage in this life cycle has a learning objective at- 
tached to it that pertains to the documentation/organization of software code or the project itself. 
The activities listed beneath the learning objective are possible ways to teach these objectives to 
your students/team members. 

Please contact us if you have any questions or if you would like our assistance in further develop- 
ing or running these discussions or activities. 

Megan Sapp Nelson, Subject Liaison for Engineering—[Megan’s e-mail] 

Jake Carlson, Data Services Specialist—[Jake’s e-mail] 


Phase 1. Problem Identification 
Activity: Environmental Scan 
What existing technologies are being used with which your new code must interact? How do they 
function? What interaction is there between your code and that existing code? What conventions 
are used in coding the existing software? What languages are used? What are the key structures of 
the code that will impact your new software? 

Discuss your conclusions with your project team. How do these observations change your un- 
derstanding of the problem identified by the project partner? 


Activity: Skills Identification Inventory 

Complete the skills identification inventory. Which skills do individuals have on the team? What are 
the strengths of each individual? What weaknesses does the project team have? Create a plan to build 
capacity in areas of team weakness. Create a plan to divide tasks on the basis of individual strengths. 


Activity: Infrastructure Setup 
Set up backups and versioning software. Make sure that everyone on the team has access to appro- 
priate directories and understands how to access them. 


Phase 2. Specification Development 

Activity: Personas 

Create personas for the end users who will be using the delivered final product. Who are they? Why 
are they interacting with the software? How do they interact with the software and documenta- 
tion? Are all of their needs met within the current specifications? Who is maintaining this software? 
What are they responsible for? Capture the personas in the project documentation. 


™ Lima, M., & Oakes, W. (2006). Service-learning: Engineering in your community. Okemos, MI: Great 
Lakes Press. 
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Activity: Project Partner Interview 
Create a list of questions to ask the project partner. In your questions emphasize who the users of 
the software are, the level of expertise of users, plans for maintenance, and situational infrastructure 
that could have an impact on your software design. 

Resources: SharePoint, skill sessions, project partner interview worksheet. 


Activity: Design Matrix 

Create a design matrix identifying the most important specifications the team identified during 
the project partner interview. Consider implications for coding and documentation as you discuss 
these specifications. 


Phase 3. Conceptual Design 
Activity: Discussion 
Hold a discussion with the aim of reaching a consensus on the following issues: 


¢ Standards for formatting the code covering issues such as indentation, line lengths, use of 
comments, and so forth. 

e Naming conventions and/or controlled vocabulary for variables, procedures, functions, 
classes and objects used in the code. 

e How and to what extent the code will be reviewed and tested to ensure that it functions as 
expected. 


If applicable, the team should make use of existing standards such as Sun’s Java Code Conven- 
tions or tools such as JavaDoc. However, teams should not adopt standards or tools without review- 
ing them and knowing how they could or should be applied in the EPICS context. It is important 
that teams take ownership of the documentation convention they adopt or develop. 

Outcomes should be written up, shared, and used in practice. The key decision points should 
also be captured and stored in a place where it is easily accessible to the team. Team members 
should be held accountable for following the decisions for documenting their code and other proj- 
ect deliverables. 


Activity: Team Covenant 

Create a team covenant. Include the following in the discussion: standards for formatting inden- 
tation, line lengths, use of comments, naming conventions, functions, classes, and objects. The 
covenant should be written up, shared, and used in team coding. 


Activity: Software Tool Selection 

Select a tool for the project team to use to create documentation as code is developed. JavaDoc is 
one such tool. Ask your TA or advisor for further input in the use of tools that will facilitate team 
coding and documentation. 
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Activity: “Living” Wire Frame 

Identify the various components that need to be built into the code and diagram how the code and 
the functionality of the code map onto each other. Make this a living document for the team that 
changes as further code is added and the code is redesigned. 


Phase 4. Detailed Design 

Activity: Functional Design Outline 

Create a diagram showing the functional design of the code, including human interaction points 
and area that will potentially require ongoing maintenance. Discuss these interaction points, put- 
ting yourselves in the position of the project stakeholders. 


e What information will each of the stakeholders need to know about the interaction points 
to understand and make use of the code effectively upon delivery? 

e What information will each of the stakeholders need to know about the interaction points 
to maintain the code? 


Document the key decision points and the decisions that are made. Assign individuals to write 
up more substantive descriptions in the project documentation. 


Activity: Team Coding 

Have members of the team “exchange” code they have written and clearly indicate to the author ar- 
eas of code that are not human readable as written. Are the standards that were agreed upon earlier 
in the semester being followed? Are the names clear? Are the functions identifiable? Revise the code 
and exchange with yet another member of the team. Continue until the code is human readable. 


Activity: Code Synopsis 

Create a brief document that synthesizes the coding decisions that you have made. Use it for a de- 
sign review and ask for feedback from design reviewers. Make changes based on the feedback you 
received and enter the document into the team documentation. 


Phase 5. Production 

Activity: Software Code Peer Review 

Team up with another project team involved in generating code. Have each team attempt to use 
the code generated by the other team, review the documentation of the other team’s code, assign a 
rating to the quality of the other team’s code using the EPICS software documentation rubric, and 
make recommendations as to how the code could be improved and deliver these recommendations 
to the other team in writing. 


Activity: Testing Plan 

Ask your TA for guidance in developing a testing plan for your software that focuses on excep- 
tion testing. Develop and implement the testing plan. Consider bringing in people from outside 
the team to have them test the software. Record your testing plan and any major changes that are 
implemented as a result. 
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Activity: Usability Testing 

Create a structured usability test. Have representative users (see the personas activity above) test the 
interface. Are they interacting with the interface as anticipated? Are there any changes that should 
be made in light of the discussion? Document any observations and resulting changes. 


Phase 6. Delivery/Transfer 

Activity: Transfer Role-Play 

Role-play the process of transferring responsibility. Recruit team leaders, TAs, or EPICS adminis- 
trators to play the role of the client or the next team leader for the project. 

The situation is that a meeting has been called to transfer the deliverables of the project from the 
project team to the client or to the team leader for the next project team. The student(s) are to pre- 
pare for this meeting by identifying what the client will need to know to understand, implement, 
and use the deliverables, or what the next team leader will need to know to continue the project 
with a minimal amount of disruption. The student will then need to identify how and where this 
information is documented to ensure a smooth transfer. 

An alternative approach would be to reduce the scale of this activity from a focus on the overall 
project to a focus on the code or a portion of the code. The questions would then shift to “What 
does the client/future team leader need to understand about the code to implement and use it?” 
and “How and where is this information documented?” 


Activity: Peer Review of Documentation 

Create a sample “transfer package” of materials (based on an actual project to the extent possible) 
and have students review and evaluate the contents paying special attention to the stated needs of 
the client or the likely needs of future project teams. Have them answer either of the following 
questions: 


1. What will the client need to know to understand, implement, and use the deliverables, and 
how and where is this information included and conveyed in the transfer package? 

2. What will the next project team need to know to continue the project, and how and where 
is this information documented? 


Phase 7. Service/Maintenance 

Activity: Maintenance Documentation 

Have students create an annotated inventory version of the project documentation that indexes the 
sources of the information needed to maintain and/or service the project. 


e What will people need to know to maintain or service the project? 

e Where is this information documented, and who is response for drafting and maintaining 
this documentation? 

e How complete is this documentation? Are there any gaps that need to be filled? 


As a precursor to this exercise, students could be given a sample annotated inventory to critique. 
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Activity: Code Synopsis 

Create an outline of the code that points back to files, .svn versions, and other necessary documen- 
tation. Test this document on students who are not affiliated with the project. Can they quickly 
find necessary information to understand the existing code? 


CHAPTER b 
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INTRODUCTION 


This Data Information Literacy (DIL) project 
team worked with two faculty members in a 
hydrology lab in the Department of Agricul- 
tural and Biological Engineering at Purdue 
University; this was one of two Purdue Univer- 
sity teams participating in the DIL project. The 
data produced by the lab include field-based 
observations, remote sensing, and hydrology 
models to help understand land-atmosphere 
interactions and the hydrologic cycle. Inter- 
views with the faculty and graduate students 
in the research group indicated that data man- 
agement standards were their primary concern. 
These Purdue researchers were neither aware 
of nor using disciplinary-developed data stan- 
dards for storage, sharing, reuse, or description 
of data. Data standards would allow their data 
to be interoperable with other data generated 
by researchers in their field and would prevent 
them from “reinventing the wheel” each time 
data must be shared. Additionally, they were 
very interested in contributing to disciplinary 
standards since they believed that standards de- 
veloped by the community had a better chance 
of being adopted. Over the course of the proj- 
ect, one of the participants became the cam- 
pus representative to a national data repository, 
which gave our program a greater urgency: cur- 
rent and future students who worked in their 
labs must be trained in and use these standards. 

Through user assessment, the DIL team 
members determined that the most impor- 
tant DIL areas to address through instruction 
were creating standard operating procedure 
documents for collecting the lab’s data, find- 
ing external data, and creating metadata. With 
regard to operating procedures, the research 
group indicated that they had some instruc- 
tions for data management listed on their wiki, 
but students did not follow them very often. 


The DIL team determined that the students 
had not internalized the need to manage and 
document data for their own work and to 
share with other members of the group. The 
wiki procedures were not specific enough to 
give students direction to successfully manage 
their data. Students also needed to incorporate 
external data—for example, using weather/cli- 
mate data as inputs in their simulations. Locat- 
ing, understanding, cleaning, and formatting 
those data is not a trivial process, and students 
can save significant time if the data are in a for- 
mat that is usable by or easily importable into 
their programs. Finally, metadata was the key 
to effectively organizing, managing, and dis- 
seminating data. The more one knows about 
the contents of a data set, the more likely one 
can make the right choice about whether to use 
it. So, a well-documented data set will be more 
visible, comprehensible, and potentially useful 
to the research community at large. 

We determined that the most effective ap- 
proach to teach these skills within the time con- 
straints of the research group was to conduct 
three instruction sessions over 3 months during 
the lab’s normally scheduled meetings. Embed- 
ding the instruction within the lab’s meeting 
schedule emphasized (1) how important the data 
skills were to the faculty members, and (2) that 
there was an urgent need to embed community 
standards for data management and curation 
into everyday practice. Overall, this approach 
to instruction was to present a contextualized 
program, grounded in the actual activities and 
procedures of the group, to reinforce the practi- 
cal need for DIL skills and attitudes and increase 
buy-in from the lab group members. 

We developed a different assessment for each 
module, appropriate for the range of learning 
objectives. The results of the assessment re- 
vealed that applying the content presented to 
real-life research workflows is a real challenge for 
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students. Even though they clearly understood 
the material presented—and even recognized 
its importance—students did not incorporate 
data management practices into their everyday 
workflow. Future plans include collaborating 
with the faculty and students to incorporate 
these skills into standard lab practices. 


LITERATURE REVIEW AND 
ENVIRONMENTAL SCAN OF DATA 
MANAGEMENT BEST PRACTICES 


The literature review focused primarily on wa- 
ter and hydrology disciplinary data manage- 
ment resources, though the interdisciplinary 
nature of the lab’s work led us to include eco- 
logical and biological research resources as well. 
The literature showed that students had little 
experience with creating metadata (Hernandez, 
Mayernik, Murphy-Mariscal, & Allen, 2012). 
The most useful information for our back- 
ground review came from the Consortium of 
Universities for the Advancement of Hydro- 
logical Science, Inc. (CUAHSI) organization 
(http://www.cuahsi.org/). Created in 2001 by 
the National Science Foundation, CUAHSI 
is the water-science community response to 
“the need to organize and extend the national 
and international research portfolio, par- 
ticularly to develop shared infrastructure for 
investigating the behavior and effects of wa- 
ter in large and complex environmental sys- 
tems” (CUAHSI, 2010). The consortium lists 
a number of points in its mission statement 
that are crucial to addressing better access to 
data, including creating and supporting re- 
search infrastructure and increasing access to 
data and information. Its strategic plan lists 
four data access goals, which demonstrate the 
forward thinking of the organization: 


1. Develop and maintain search services for 
diverse sources of data and the underlying 
metadata catalogs (building on and ex- 
tending from the Hydrologic Information 
System—HIS), including an access portal 
and coordination with providers of water- 
related information 

2. Develop a mechanism for citation and use 
tracking to provide professional recogni- 
tion for contributions to community data 
archives 

3. Solicit community input on emerging data 
needs and facilitate access to new types of 
data 

4, Coordinate development, promotion, and 
adoption of metadata standards between 
universities, governmental agencies, and 
the private sector for interpreted data 
products (e.g., potentiometric surfaces, ar- 
eal estimation of precipitation, and input- 


output budgets). (CUAHSI, 2010, p.18) 


Perhaps the most interesting area to note 
in the CUAHSI strategic plan is its continued 
development of metadata standards. CUAHSI 
recognizes the need for a shared language for 
both researchers and information systems to 
communicate to other researchers and infor- 
mation systems. To this end, the consortium 
is expanding the CUAHSI Hydrologic Infor- 
mation System (HIS), a Web-based portal for 
accessing and sharing water data (CUAHSI, 
2013). The HIS operates with two impor- 
tant metadata standards: the Water Metadata 
Language (OGC, 2013), which is an open 
metadata schema created by the San Diego 
Supercomputing Center for hydrological time 
series and synoptic data, and the Federal Geo- 
graphic Data Commission (FGDC) metadata 
schema (FGDC, 1998) created for geographic 
information system (GIS) and spatial data. 
Other metadata and data practices include the 
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well-developed schema of the Ecological Meta- 
data Language (EML), originally developed by 
the Ecological Society of America for ecology 
and related disciplines (Knowledge Network 
for Biocomplexity, n.d.b). Although not specif- 
ically created for hydrology, the EML metadata 
standard uses similar descriptions and requires 
an understanding of geospatial needs that are 
specific to the hydrology discipline, more so 
than more general standards such as Dublin 
Core (Dublin Core Metadata Initiative, 2013). 
Additionally, this Purdue DIL team consulted 
very useful EML tools, such as the Morpho 
data management application, a download- 
able metadata entry template (Knowledge Net- 
work for Biocomplexity, n.d.a), when creating 
a metadata exercise for the graduate students. 

Since the greatest needs for our research group 
focused on metadata and laboratory standard 
operating procedures for data management, 
we consulted Qin and D’Ignazio (2010), who 
provided details of a metadata-focused scientific 
data course of study. Stanton (2011) described 
the duties of practicing e-science professionals, 
which provided a foundation in actual tasks 
that scientists undertook in the course of man- 
aging data. Finally, the EPA (2007) provided a 
solid introduction to the purpose and process of 
creating standard operating procedures, which 
were applied to the student activities. 


CASE STUDY OF GRADUATE 
STUDENT DATA INFORMATION 
LITERACY NEEDS IN AGRICULTURAL 
AND BIOLOGICAL SCIENCES 


The hydrology research groups consisted of 
two faculty members who focused on the in- 
tegration of field-based observations, remote 
sensing, and hydrology models to increase 


understanding of land-atmosphere interac- 
tions and the hydrologic cycle. Their work 
requires the acquisition of different kinds of 
data and the ability to convert data to ensure 
interoperability. The primary faculty member 
understood the importance and significance 
of good data practices, but still struggled with 
achieving high-quality data management in 
the research groups. The data collected in the 
lab ran the gamut of data types. On the one 
hand, the lab manually collected water sam- 
ples and analyzed the results; tracking their 
processes with print lab notebooks that were 
later scanned into electronic formats. On the 
other hand, the group also downloaded remote 
sensing data from external sources, which were 
fed into computer models that created large 
data files in the process. Managing these three 
types of data—field samples, (external) remote 
sensing data, and computer simulations— pro- 
vided constant challenges, especially as the 
students gathering or processing each different 
kind of data communicated their results with 
each other. 

To understand the needs of the graduate 
students, the Purdue DIL team conducted six 
interviews between April and June of 2012. 
We used the DIL interview protocol (avail- 
able for download at http://dx.doi.org/ 10.5703 
/1288284315510). This is a semi-structured 
interview instrument that allows for follow-up 
and clarification questions. The Purdue DIL 
team interviewed the primary faculty member 
(Faculty A), from the Department of Agricul- 
tural and Biological Engineering (ABE). We 
then interviewed five ABE graduate students (a 
mix of master’s and PhD students) working in 
this faculty member’s research group. (Note: A 
second faculty member [Faculty B] and other 
graduate students working on their research 
team could not be reached for interviews but 
were included in the educational program. 
This second faculty member was included in all 
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subsequent actions and discussions in creating 
instructional content and assessments.) 

One reason that our team approached Fac- 
ulty A to be part of this project was because 
he had already expressed concern about teach- 
ing data management and data literacy skills to 
graduate students for the educating, accultura- 
tion, and training process of graduate school. 
He was familiar with many data literacy skills 
already, generally from the absence of good 
practices. These resulted in data loss by stu- 
dents due to the lack of proper backup, poor 
description, and poor organization of files. For 
example, he described: 


I have been slowly developing a data manage- 
ment plan after our conversations over the 
last couple of years, . . . [but one] that’s more 
in my head. . . . But I think just the general 
conversation has clarified in my head that 
rather than just repeating over and over again 
to my students what they should be doing, 
having a written statement certainly helps. 
And then when they get in trouble, like the 
student who was saving everything on their 
external USB hard drive, I [can] point back 
to the data management plan that says [they] 


werent allowed to do that. 
He further described: 


I tried to establish a naming convention, 
but nobody ever listens to the naming con- 
ventions, so next thing you know you've got 
five files labeled “Final 1”, ”Final 2”, “Final 
A’, “Final C.” So we keep running into this 
problem with stuff that people who have left, 
right? So what is this file? We've got three files 
that look identical except for the “Final” vari- 


ation name. Which one is it? 


Faculty A also experienced difficulties with 
understanding or obtaining the lab’s data from 


students after their graduation. He explained: 


I had a student in my first couple of years who 
[collected] field data for me, and I didn’t have 
a written plan. He didn’t follow my [verbal] 
plan, and so he left with all of the material. 
... Pve had a couple of people ask me about 
that data and what was available and it’s like, 


well, I’ve never actually seen it. 


Faculty A offers a class on environmental in- 
formatics. Most of the skills in the course are 
not taught to graduate students generally prior 
to their entering the lab unless they are picked 
up informally from other advisors or students. 
The class included general best practices for re- 
search, but many discipline-specific items were 
covered as well. Even so, one of Faculty A’s pri- 
mary concerns was that students were not re- 
ceiving any data training outside of his lab or in 
their course work. Additionally, all his research 
group students were in the ABE department 
studying some aspect of hydrology but from a 
variety of angles: using field or observed data, 
using remote sensing data, or creating models. 
This meant that it was difficult to create and 
enforce a one-size-fits-all approach to a written 
DMP. Faculty A stated: 


So I think if you have a lab-based kind of 
group, then they probably have some meth- 
odology that they lay out in a lab book, but it’s 
harder when it’s—you know—a small group 
and people are doing different things. This 
is the dilemma for me. I’ve got one gradu- 
ate student who's doing mostly remote sens- 
ing work. I’ve got a couple of grad students 
who are going to do more observational work. 
And then most of them are doing modeling 
work. . . . [I]t becomes more individualized, 
right? It’s harder to invest the time to come 
up with the documentation [for data manage- 


ment] because it’s [for] one or two people. But 
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the problem is that those one or two people 
become somebody else [grad students replac- 
ing current] or maybe multiple people at some 


point, right? So we need to be capturing this. 


To help with this problem, Faculty A had in- 
troduced students to some general data manage- 
ment policies on a wiki site once they started in 
his lab. When interviewed, students all displayed 
some awareness that there were formal data man- 
agement policies in place within the research 
group. However, they also all expressed varying 
degrees of compliance, sometimes because they 
were not sure they applied to their specific data 
situation. One graduate student said: 


Yes we have a wiki site. [The faculty advisor] 
lists all of the procedures that we need to fol- 
low. . . . (Laughs) But I think I do not follow 
that, because my data is too large and it’s very 


difficult to ask Purdue to extend my space. 


In addition to our interview results in the 
DIL project, our interview included ratings of 
the DIL competences. Here, both the faculty 
and the graduate students interviewed rated 
most of the DIL facets as important (see Fig- 
ure 6.1). The highest rated concepts by the stu- 
dents were discovery and acquisition, data pro- 
cessing and analysis, and data management and 
organization, with ethics and attribution, data 
visualization and representation, and metadata 
and data description very highly rated as well. 


A MULTI-SESSION INSTRUCTION 
APPROACH TO DATA INFORMATION 
LITERACY SKILLS 


In developing our DIL program, we dis- 
cussed with both of the faculty members the 
nature and extent of instruction needed by 


their students. The discussion centered on the 
highest priority skills needed by the students, 
which skills would best be facilitated by librar- 
ian partners, and which skills, if successfully 
learned, would have the greatest impact on the 
research group overall. We also discussed how 
much time would realistically be available for 
face-to-face instruction, so that we could make 
the best use of the research groups’ time. With 
a total of 2 faculty members and 13 students, 
each with their own academic schedule, the fac- 
ulty found it challenging to find dates and times 
for even an hour-long group meeting a week. 
We settled on a three-part instructional 
strategy that included some prep work prior 
to the face-to-face session and homework for 
the students to complete following the session. 
Given the time constraints, the DIL team felt 
that we should concentrate on just the most 
important and directly applicable DIL skills for 
which the librarians had unique expertise. Con- 
sequently, we decided to focus our instruction 
on discovery and acquisition, data management 
and organization, ethics and attribution, and 
metadata and data description as the remaining 
high-impact fundamental areas from the sur- 
vey. While additional topics such as data visu- 
alization and representation and data processing 
and analysis were important, they might best 
be taught by the faculty members themselves. 
It became apparent that, while the research 
group had a preliminary set of data manage- 
ment policies, these policies were not well 
understood or adhered to by the graduate stu- 
dents. Thus, we determined that one way to 
provide a scaffold for the DIL topics would 
be to develop standard practices for handling 
data in the research group. From the literature 
review and environmental scan, we concluded 
that these standards must be developed col- 
laboratively to ensure maximum adoption by 
the group. In short, our goal was to help the 
group establish its own community standards. 
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Figure 6.1 Average DIL competencies ratings for the agricultural and biological sciences case 
study. Ratings based on a 5-point Likert scale: 5 = essential; 4 = very important; 3 = important; 


2 = somewhat important; 1 = not important. 


To increase the authenticity of the exercises, 
each of the instructional activities focused on 
students tackling the actual problems of their 
group using the content presented in class. 


RESULTS OF THE FALL 2012 
INSTRUCTION SESSIONS 


On the basis of our findings, our team decided 
to give three presentations to the combined re- 
search group over a 3- to 4-month period in 
the fall of 2012. Our approach was to fold the 
instruction into the regular meeting schedule 
to make the DIL material part of their work- 
flow, rather than as something extra or outside 
of what they would have to do as a group any- 
way. Faculty A and Faculty B’s research groups 
met together biweekly, so our team worked 


with them at every other meeting, or roughly 
once a month, starting in September, for a total 
of three sessions. 

The topics for the three sessions included 
(1) developing a data checklist modeled on a 
standard operations procedures or laboratory 
protocol format, (2) searching for data in exter- 
nal databases, and (3) creating metadata. The 
learning objectives for each session are listed in 
Table 6.1 and the following sections detail the 
sessions. 


Session 1: Data Checklist/Standard 
Operating Procedures 


The aim of Session 1 was to teach the stu- 
dents to articulate the relevant components 
of a standard operating procedure and to ap- 
ply those components when creating the actual 
procedures for the research group. In earlier 
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TABLE 6.1 Learning Objectives of the Fall 2012 Library Instruction Sessions 


Session # Topic 

Data checklist/ 
standard 
operating 
procedures 


Session 1 


Searching for 
external data 


Session 2 


Learning Outcomes 


Students are able to articulate the relevant components of a standard 
operating procedure and apply those components to create an actual 
procedure for the research group 


Increased student appreciation for the value of metadata in locating data 
from external sources, and as a corollary, the importance of applying 


metadata to their own data sets so others can find (and cite) them in their 


own research 


Session 3 Creating 


metadata 


Students are able to analyze their own data sets and determine 
appropriate metadata to describe those sets. Students would then be able 


to curate their data within the structure of Purdue's data repository 


discussions with Faculty A, he mentioned that 
something as simple and straightforward as a 
checklist for the kinds of data that might be 
collected would be a good approach. This could 
outline all the types of data needed, while pro- 
viding an overview of the data in this outline. 
Faculty A created an initial checklist for the 
three categories of data collected: field obser- 
vation data, remote sensing data, and model 
simulation data. Each category was unique and 
therefore had a different checklist governing its 
organization. Initially, each checklist contained 
7 to 15 elements. For example, the field obser- 
vation data checklist included the following in- 
formation and data elements for organization 
and management: 


e Field notebooks—scanned copies of all 
pages related to activities 

e Digitized notes and measurements from 
field notebooks 

e Raw files downloaded from field equip- 
ment 

e Changes to sample control program (text 


file) 


e Photos of sample sites 


e IDs associated with physical samples, if 
collected 
e Lab analysis results for all physical samples 


The original checklist was meant to be a 
step-by-step list of things that a student might 
do to properly capture and describe all the 
data gathered in an instance of field observa- 
tion. However, after discussions with the fac- 
ulty collaborators, we determined that the 
checklists gave insufficient or ambiguous direc- 
tions, which was why students did not find the 
checklists useful. 

The DIL team started the session by having 
students recall when they started in the group 
and what information they would have liked 
to have about the data they were working 
with from the previous students. We brain- 
stormed the attributes that were important to 
them (e.g., units, weather conditions, analy- 
sis techniques, calibration information) and 
used that to set the stage for determining how 
they could provide that information about 
the data they were collecting or producing. 
We also introduced some examples of best 
practices in standard operating procedures to 


Teaching DIL Skills in a Library Workshop Setting CHAPTERS 137 


show students how to translate their needs for 
information into an actual set of steps/activi- 
ties that would lead to the production of that 
information. 

The team followed up the instruction with 
an exercise using these checklists. To have the 
students gain ownership of the checklists, the 
team asked students which elements were miss- 
ing. This generated some initial suggestions, 
and then we broke the students into three 
groups based on which of the three checklists 
matched most closely with the type of work 
they did within the research group. Some stu- 
dents matched with two or even all three areas, 
so they self-selected which group they wanted 
to join based on their interest or to help bal- 
ance the group sizes. The faculty members each 
joined one of the groups. The groups were then 
asked to work with their assigned checklist in 
more depth, adding to it and documenting the 
most realistic way it could be implemented in 
current workflows. Their homework was to fin- 
ish their checklist and share their work with the 
group in 2 weeks. Each group took a slightly 
different approach; the two groups with the 
professors as members were more thorough 
than the third group. The third group possibly 
lacked the pressure, the focus, and the expertise 
of having their instructor as a member of their 
work group. 

The three resulting checklists are in Appen- 
dix A to this chapter, and the entire research 
group continues to work toward incorporat- 
ing the data checklists into their regular work- 
flow. Overall, the team found that the final, 
community-driven checklists were greatly 
improved over the faculty member’s original 
draft. They exhibited more detail and less am- 
biguity, and they showed that students could 
transfer the content of the instructional ses- 
sion to documentation that was directly rel- 
evant to their lab. 


Session 2: Searching for External Data 


For the second session, the goal was to increase 
student appreciation for the value of meta- 
data in locating data from external sources, 
and as a corollary, the importance of applying 
metadata to their own data sets so that oth- 
ers can find (and cite) them. After debriefing 
the checklist homework from the first session, 
which provided reinforcement of the core con- 
cepts of standard operating procedures, the 
second class introduced the Ecological Meta- 
data Language or EML, and Morpho, the tool 
for describing data sets using EML. Although 
the Water Metadata Language (WML) at first 
seemed to be the best fit with the hydrology 
group, and may prove to be in the long run, 
the WML tools were not yet as fully developed 
nor as user-friendly as those provided for EML. 
The DIL team began the discussion with the 
“peanut butter sandwich exercise” (i.e., to write 
down the instructions to make a peanut but- 
ter sandwich and then have someone else carry 
out those instructions explicitly). This demon- 
strated how description can make a difference 
in how well individuals understand procedural 
processes and to illustrate the need to be explicit 
and complete when describing something. 
Next, we drew parallels of the description 
exercise to metadata. Here we discussed how 
well-documented metadata could help some- 
one else understand a data set—from how it 
was gathered to how it was analyzed—and its 
greater meaning in the context of other data. 
Students were divided into small groups and 
asked to search the Knowledge Network for 
Biocomplexity (KNB) data registry using Mor- 
pho to find a data set that might be relevant to 
them. This was challenging for many students: 
the keywords that they used were very specific 
and often unsuccessful while very general key- 
words such as “water” succeeded. The general 
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“water” records were quite illustrative of how 
helpful more precise and in-depth descriptions 
would have been for the searcher. 

In the end-of-class assessment, we asked stu- 
dents what they learned, what they will begin 
to incorporate into their own work, and what 
was still unclear (see Appendix B to this chap- 
ter for the assessment tool). Almost all students 
responded that they had a deeper understand- 
ing of how important metadata could be in 
describing their data to others and as a way 
for others to locate their data. They also ap- 
preciated the need to be explicit in their own 
descriptions of their data so that searchers can 
determine if and how the data might be useful 
to them. The results of these self-assessments, 
reinforced by the instructors’ observations of 
the students while searching for external data, 
aligned very well with the learning outcomes. 
‘The students saw clearly that poor description 
could make another researcher’s data difficult, 
if not impossible, to reuse, and this set the stage 
for what they would learn in Session 3, creating 
their own metadata. 


Session 3: Creating Metadata 


We designed the third session for students to 
be able to analyze their own data sets and de- 
termine appropriate metadata to describe those 
sets within the structure of an online reposi- 
tory. To demonstrate this, students were asked 
to submit their own data to our institutional 
data repository, the Purdue University Research 
Repository (PURR), and to create a brief meta- 
data record to describe it. We asked students to 
bring a sample of their data to this session. A 
data scientist introduced the students to PURR 
and described the basic principles of what a 
repository could do for their submitted data. 
After a brief walk-through on the mechanics 
of getting started, which included creating an 


account in PURR, each student and the two 
faculty members created a project space. The 
PURR project space allows users to designate 
individuals with various roles such as “collabo- 
rators” or “owners,” and allows owners of the 
project space to provide access to the materials 
in their project space to selected individuals. 
Each participant then uploaded his or her data 
file to the project space. 

For each file uploaded, PURR requires very 
basic metadata, based on the Dublin Core 
metadata standard (http://dublincore.org), 
for description. Because the metadata that is 
asked for by PURR is so general in nature, we 
decided to add a more sophisticated metadata 
assignment to the class that was discipline 
appropriate. For this assignment, the librar- 
ies’ metadata librarian created a Web-based 
form based on EML (see Appendix C to this 
chapter) and asked students to fill out and in- 
clude with their data submission to PURR. 
The 15-field metadata form included subject- 
based items such as geographic coordinates, 
temporal coverage, methods, and sampling 
units, as well as more general items like key- 
words, abstract, data owners, and data con- 
tacts. This information automatically popu- 
lated an Excel file that could be repurposed 
as a supplementary document for the data 
deposited into PURR. Unfortunately, at the 
time PURR did not accommodate custom 
metadata fields as a part of its metadata regis- 
try. So the metadata had to be downloaded as 
a separate text file for a potential user of the 
data to take full advantage of the EML infor- 
mation provided by the author. The metadata, 
if properly qualified, could also be inserted 
into a bibliographic data repository, such as 
the KNB data registry, using their metadata 
software, Morpho. However, the students 
were not asked to take that extra step due to 
time constraints. 
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This exercise required students to think 
about how best to describe their data for any- 
one other than themselves. This required them 
to capture their tacit knowledge and internal- 
ized assumptions about a data set—knowledge 
that must also be passed along to another in- 
dividual, even someone they may be working 
closely with, in order for them to understand 
the data. DIL team members reviewed the stu- 
dents’ metadata submissions and offered sug- 
gestions for improvement. Although students 
were reluctant to do additional metadata entry 
when depositing their data, the convenience 
and straightforwardness of the online form im- 
proved students’ willingness and confidence to 
complete this task successfully. In the future, as 
the use of WML continues to increase and as it 
becomes more robust, we recommend using an 
online metadata form with fields from WML, 
or a blend of EML and WML, if that would 
be appropriate, for a broader audience of data 
submitters. 

Although students said that they under- 
stood the need for good descriptive metadata, 
they were not quick to fill out the metadata 
template that we provided. Students were 
prompted several times to complete the form, 
and 10 out of 12 finally submitted the form. 
When filling out the forms, students succeeded 
in writing descriptive methods, study extent, 
and sampling procedures, and to a lesser ex- 
tent, in providing keywords (perhaps because 
completing these tasks are already a familiar 
exercise when writing papers for journals). 
Additionally, they were very thorough in de- 
scribing geographic coverage. This may not be 
surprising given the geographic focus of their 
research. Students were less successful when 
listing data owners, contacts, and affiliated 
parties, even though this was covered in class. 
Understanding who owns the data and what 
roles they “officially” play in creating the data 


was a complicated aspect of describing data. 
This is an area that the team intends to cover 
more fully in future sessions. Overall, the team 
will need to find ways to work with the fac- 
ulty members to insert the metadata template 
into an existing workflow, so that students do 
not see this merely as something externally im- 
posed and extra work. 


DISCUSSION 


The integrated lab-meeting approach was gen- 
erally successful and contained elements that 
could be replicable for a wider audience. The 
exercise of creating checklists to address data 
management and organization skills, though 
the results here are specific for these research 
groups, is a general approach that could be 
used by other labs or researchers. Any lab or 
work group can generate the detailed list of 
items that need to be captured or addressed 
in the data gathering process. Also, with the 
faculty-student-librarian team approach used 
in the DIL project, this 
list can be developed 


. Contextualize the 
so that there is a feel- 
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each research group. 
tise and an understand- 
ing of what information 
absolutely has to be collected. Students bring 
an operational perspective of how the data are 
incorporated into the data collection; they are 
often the ones performing the collection tasks 
and can identify ways to streamline the pro- 
cess. Finally, librarians bring the DIL exper- 
tise to facilitate the discussion between faculty 
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and students as well as to optimize the acces- 
sibility, internal consistency, and organization 
of the data. 

Even before the DIL project began, the disci- 
plinary faculty member believed that metadata, 
or some description of the data, was critical. 
He had experienced too many instances where 
one student’s data could not be understood, 
by himself or by others, due to inadequate de- 
scription. Sometimes this was reparable after 
many hours spent trying to reconstruct what 
the data represented; other times the data were 
simply lost or unusable due to the fact that the 
description could not be recovered or the stu- 
dent had graduated and taken the data. Our 
instruction sessions covering the importance 
of good data description and specific metadata 
tools positively impacted the students’ under- 
standing of the issue. In their assignments the 
students demonstrated their understanding 
of how poor metadata could make a data set 
useless to anyone other than the creator. They 
applied this knowledge when creating better 
metadata for their own data descriptions meant 
for a broader audience. 

Despite this appreciation, the students still 
needed metadata tools to guide this process if 
they were to be successful. Creating the online 
tool for entering modified EML metadata in- 
creased the likelihood that they would actually 
adopt this new step in the data management 
process. The DIL team would like to make the 
metadata more usable, so that others might 
take advantage of the work that the students 
put into describing their data. Currently, sav- 
ing the EML metadata as an Excel file does not 
take full advantage of the power of the descrip- 

tive language; therefore 
Getting the students developing a more robust 
to adopt these online entry form and/or 
practices into their brokering the metadata 
everyday workflow to disciplinary-specific 


was a challenge. repositories will help stu- 


dents appreciate the value of their work. Ulti- 
mately, search tools that take advantage of the 
descriptive metadata can lead to greater reuse 
of the data by others. 

However, getting the students to adopt these 
practices into their everyday workflow was 
a challenge, and we had limited success with 
this during the project. In hindsight, recogniz- 
ing adoption as one of the greatest barriers, we 
might have worked with the students from the 
beginning to incorporate these practices into 
their research workflows. In tandem, we might 
have worked more closely with the faculty to 
create a structure, higher expectations, and 
a process for implementing the DMP within 
the lab. However, the adoption of these new 
practices might simply take time. It could be 
that regular use of the practices will eventually 
become habit. Additionally, asking the faculty 
partners to enforce the new practices through 
regular and frequent monitoring will likely pay 
off in the long run with regard to adoption. As 
these practices become “business as usual” they 
will transfer easily to new students as they cycle 
into the research groups and formal training 
for one student becomes peer-to-peer learning 
for the next. 


CONCLUSION 


Overall, this DIL team felt that the program 
was very successful in communicating DIL 
concepts and impressing upon graduate stu- 
dents the importance of good data practices. 
Implementation is still a work in progress, as 
the faculty researchers are in the best position 
to address accountability in order to embrace 
the practices that the group has developed. 
That said, there have been robust conversa- 
tions within the research group about the need 
for improving data management, and all of 
the members of the group are speaking from 
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a higher level of understanding than they had 
previous to the project. The DIL model works 
best when contextualized to the needs of the 
target audience. Hands-on activities aligned 
with the goals of the research group extended 
what they were already doing or trying to do, 
which gave them more tools and concepts to 
apply to their research environment. At the end 
of the instructional program, students had tan- 
gible results that included standard operating 
procedures for the lab and data sets submitted 
to a repository. 

As we reflect on the activities, data man- 
agement and organization (standard operating 
procedures) and metadata and data description 
(describing and depositing data sets into a re- 
pository) jump out as the areas that found the 
most traction within the research group, and 
might be the driving principles for a more gen- 
eral DIL model in this discipline. Also, while 
library and information science professionals 
may focus on the need to share data and make 
it openly available, the focus among research- 
ers is shifted more toward sharing data and 
making it accessible mainly within the research 
group. Therefore, when stressing the value of 
data management skills, highlighting the ben- 
efit to the research group is key. 

In the course of the activities, we discovered 
that much of the data in distributed reposito- 
ries is not well described, so locating and using 
that data is a continuing challenge. As a result, 
researchers may gravitate toward centralized, 
well-stewarded data—for example, such as that 
produced by government agencies. For many 
“small science” areas, the lack of quality knowl- 
edge management systems provides challenges 
for the successful interoperability and shar- 
ing of data among research groups. The lack 
of good metadata limits progress in this area, 
as there are few examples of best practices in 
action in the disciplinary data repositories for 
their community. 


Finally, this case study found that graduate 
students have no trouble grasping the con- 
cepts of DIL when the concepts are presented 
to them. However, getting students to change 
current practices, whether on their own or in a 
group setting, is an ongoing challenge. It is un- 
clear whether this is due to the lack of emphasis 
on data management in the lab, because fac- 
ulty are not stressing the need, or that students 
are not comfortable nor knowledgeable about 
how to adjust current practice. The important 
conclusion is that our educational approach of 
modules was not enough to ensure implemen- 
tation of best practices. Further research and 
development is needed to address how students 
and faculty can not only learn the skills in- 
volved with DIL, but implement the DIL best 
practices as well. 
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APPENDIX A: Data Archiving Checklists for Session 1 of the 
Agricultural and Biological Sciences Case Study 


These checklists were generated by the students and faculty in the Agricultural and Biological Sci- 
ences case study of the DIL project. They include checklists for handling the three types of data 
generated by the research group: (1) Field Observation Data, (2) Remote Sensing Data, and (3) 
Simulation Model Data. 


Data Archiving Checklist 
Field Observation Data 


Field notebooks—scanned copies of all pages related to activities 
Date scanned: 
Date scanned: 
Date scanned: 
Digitized notes and measurements from field notebooks 
Date scanned: 
Date scanned: 
Date scanned: 
Raw files downloaded from field equipment 
Date downloaded: 
Date downloaded: 
Date downloaded: 
Changes to sample control program (text file) 
Text file name: 
Photos of sample sites 
Photo files stored: 
IDs associated with physical samples, if collected 
ID: 
ID: 
ID: 
Lab analysis results for all physical samples 
Files stored: 
Files stored: 
Files stored: 
Associated remote sensing data? 
Notes: 
Associated simulation data? 
Notes: 
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Processed Files 
Name of file: 
Quality control program: 
Outside data sources: 
Photographs of samples: 
Writing/compiling data: 
Order of Processing: 
Formats, fields, missing data, processing, units, time, how collected, where collected, 
weather conditions 
Simulation data: inputs, sim software used (version), size/resolution/scale, format, fields, 
units 
Remote sensing: resolution-temporal, spatial; when collected; name of sensor; cloud/ 
weather, calibration, projection, file type—raster/shape 
Metadata file: Data dictionary 


Data Archiving Checklist 
Remote Sensing Data 


Remote sensing platform(s) and sensor(s) used, and status 
Platform/sensor/status: 

Platform/sensor/status: 
Platform/sensor/status: 

Raw remote sensing files (DNs) 
DN file: 

DN file: 
DN file: 

Atmospheric conditions, including radiosonde or other vertical profile data; output from 
data assimilation models; weather maps—collect all available data 
Notes: 

All files/information required to georegister imagery 
Files stored: 

Files stored: 

Files stored: 

Radiance files, not georegistered 
Files stored: 

Files stored: 

Radiance files, georegistered 
Files stored: 

Files stored: 

Final imagery analysis products 
Files stored: 

Files stored: 
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Documentation of all steps taken in processing remote sensing images to final form 
Atmospheric corrections 
Emissivity corrections 
Georegistration process 
Classification or analysis methods 
Associated field observation data? 
Notes: 
Associated simulation data? 
Notes: 
Data dictionary 


Data Archiving Checklist 
Model Simulation Data 


Model inputs (all inputs should be for simulations used in analysis) 

Meteorology 
File stored: 

Vegetation 
File stored: 

Soils 
File stored: 

Global control file 
File stored: 

Streamflow routing model input files 
File stored: 

File stored: 
File stored: 
Model evaluation data 

Observed streamflow 
File stored: 

Other observation types 
Observation type/file stored: 
Observation type/file stored: 

Model version 

Hydrology model source code as used in the simulations 
Source code file stored: 

Routing model source code 
Source code file stored: 

Source code from other models used 
Source code file stored: 

Source code file stored: 

Model analysis products 
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Raw model simulation output 

For very large data sets, a filename should be provided and a location on fortress 
File stored: 

For smaller data sets, all output files should be migrated into HDF5 or tarred into a 
single file 
File stored: 

Files that have been developed from the raw model output and that were the basis of 
analysis (e.g., output from the HDF5 summary statistics program), especially if they 
contain additional information not used in the final published product but could be 
used for additional analysis 
File stored: 

File stored: 

All data files used to develop graphics or tabular data 
File stored: 

File stored: 
File stored: 

Scripts used to develop published graphics or tabular data 
Script: 

Script: 
Script: 

High-quality EPS (preferred since they can be edited for minor changes), PNG (figures), 
or JPEG (pictures) files of published figures. 
EPS file: 

PNG file: 
JPEG: 
Associated field observation data 
Notes: 
Associated remote sensing data? 
Notes: 
What not to do format/units 
Metadata document/data dictionary 
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APPENDIX B: Assessment Tool for Session 2 of the 
Agricultural and Biological Sciences Case Study 


The Data Information Literacy (DIL) team used the following tool to assess the students’ response 
to each of our three sessions. 


1. Briefly describe what you learned in today’s session: 

2. List one thing that you will definitely incorporate into your own data gathering/descrip- 
tion/management after today: 

3. Briefly describe anything that was discussed today that is still unclear for you: 


APPENDIX C: Metadata Form for Session 3-Data Package Metadata 


Enter the title of the data package. The title field provides a description of the data that is long 
enough to differentiate it from other similar data. 

Title:* 

Enter an abstract that describes the data package. The abstract is a paragraph or more that 
describes the particular data that are being documented. You may want to describe the objec- 
tives, key aspects, design, or methods of the study. 

Abstract:* 

Enter the keywords. A data package may have multiple keywords associated with it to enable 
easy searching and categorization. In addition, one or more keywords may be associated with a 
keyword thesaurus, taxonomy, ontology, or controlled vocabulary, which allows the association 
of a data package with an authoritative description definition. Authoritative keywords may also 
be used for internal categorization. An example of an authoritative thesaurus is the National 
Agricultural Library Thesaurus: http://agclass.nal.usda.gov/dne/search.shtml 

Authoritative keyword source. If an authority was used for the keywords, identify by name the 
authority source. 

Keywords (separate with commas):* 

Enter information about the owners of the data. This is information about the persons or or- 
ganizations certified as data owners (e.g., principal investigator for a project). The list of data 
owners should include all people and organizations who should be cited for the data. Minimally 
include full name, organization name, owner address, and e-mail. 

Data Owners:* 

Enter information about the contacts. This is information about the people or organizations 
that should be contacted with questions about the use or interpretation of your data package. 
Minimally include full name, organization name, contact address, and e-mail. 

Contacts:* 

Enter associated parties’ information. These are persons or organizations functionally associated 
with the data set. Enter the relationship. For example, the person who maintains the data has an 
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associated function of “custodian.” Minimally include functional role, full name, organization 
name, party address, and e-mail. 

Associated Party: 

Is your data set part of a larger umbrella project? Data may be collected as part of a larger 
research program with many subprojects, or they may be associated with a single, independent 
investigation. For example, a large NFS grant may provide funds for several primary investiga- 
tors to collect data at various locations. 

If part of a larger project, identify by name the project. If applicable, include funding agency 
and project ID. 

Enter a paragraph that describes the intended usage rights of the data package. Specifi- 
cally, include any restrictions (scientific, technical, ethical) to sharing the data set with the public 
scientific domain. 

Usage rights:* 

Enter a description of the geographic coverage. Enter a general description of the geographic 
coverage in which the data were collected. This can be a simple name (e.g., West Lafayette, In- 
diana) or a fuller description. 

Geographic coverage:* 

Set the geographic coordinate s which bound the cove rage or a single point. Latitude and 
longitude values are used to create a “bounding box” containing the region of interest (e.g., de- 
grees/minutes/seconds N/S/E/W) or a single point. 

Bounding box or point: 

Enter information about temporal coverage. Temporal coverage can be specified as a single 
point in time, multiple points in time, or a range thereof. 

Temporal Coverage:* 

Enter method step description. Method steps describe a single step in the implementation of a 
methodology for an experiment. Include method title, method description, and instrumentation. 

Methods: 

Study extent description. Describe the temporal, spatial, and taxonomic extent of the study. This 
information supplements the coverage information you may have provided. 

Study extent: 

Sampling description. Describe the sampling design of the study. For example, you might de- 
scribe the way in which treatments were assigned to sampling units. 


Sampling: 


*Required fields 
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INTRODUCTION 


The University of Minnesota (UMN) team col- 
laborated with a civil engineering lab research- 
ing the structural integrity of bridges, experi- 
mentally and within the state of Minnesota, 
to identify the data information literacy (DIL) 
skills that graduate students in that discipline 
needed to be successful researchers. In-depth 
interviews with the civil engineering group 
found that graduate students lacked DIL skills, 
particularly metadata and data description, eth- 
ics and attribution, and digital preservation. The 
absence of these skills negatively impacted the 
students’ abilities to effectively pass their data 
sets on to the next graduate student on the 
project. 

Based on these findings, in the fall of 2012 
the authors launched an instructional response 
to address the DIL skills absent from the cur- 
riculum. This instructional approach utilized a 
modularized e-learning format to reach busy 
graduate students (Brenton, 2008) through an 
extracurricular Data Management Course. The 
DIL team created a seven-module non-credit 
online course (http://z.umn.edu/datamgmt) 
using Google Sites, Screenflow, and YouTube. 
The self-paced course allowed students to com- 
plete the requirements outside of their formal 
course work and research activity. As a compo- 
nent of the course, each student wrote a draft 
data management plan (DMP) for creating, 
documenting, sharing, and preserving his or her 
data using a template offered by the instructors 
that aligned with each of the seven modules. 
The instructors offered this online course to 
all structural engineering graduate students in 
the fall of 2012 (11 students enrolled), giving 
students the whole semester to complete the re- 
quirements, and then opened up the course to 
any science, technology, engineering, or math- 
ematics (STEM) graduate student in the spring 


of 2013. Forty-seven students enrolled in the 
spring semester (for a total of 58 students over- 
all). Five students from the fall semester com- 
pleted the course (three out of these five choose 
to defer their participation to the spring semes- 
ter when they expected to work with research 
data) and six additional students completed the 
course in the spring. The results of an assess- 
ment survey sent to students immediately after 
completing the course, iterative feedback on 
their completed DMP, and a follow-up survey 
on how they implemented the DMP 6 months 
after taking the course were positive. Results 
from this course informed the development of 
a “flipped classroom” version of the course in 


the fall of 2013. 


DATA MANAGEMENT TRAINING 
AND PRACTICE IN THE CIVIL 
ENGINEERING DISCIPLINE 


Currently civil engineering poorly defines its 
disciplinary expectations regarding teaching 
data management to its students. The topic 
of data literacy can only be inferred into exist- 
ing learning outcomes or other standards that 
touch upon data tangentially, usually under 
outcomes that focus on the overall experimen- 
tation process. 

The American Society of Civil Engineers’ 
engineering curriculum, Civil Engineering Body 
of Knowledge for the 21st Century: Preparing the 
Civil Engineer for the Future (BOK 2) (ASCE, 
2008), does not address data literacy explicitly. 
Currently the integration of these skills into 
the graduate-level curriculum remains com- 
pletely voluntary. Students graduating have 
no guarantee of receiving formal education in 
the best practices of data management. Many 
students learn through informal instruction or 


address the problem when they suffer their own 
data loss. 
A report produced between iterations of the 
BOK, Development of Civil Engineering Curri- 
cula Supporting the Body of Knowledge for Profes- 
sional Practice, found room for improvement in 
the depth of students’ engagement with data, 
citing one example where “students are not able 
to take an open-ended real world situation and 
design the experiments that would provide the 
necessary data to solve the problem” (American 
Society of Civil Engineers Curriculum Com- 
mittee, 2006). 
Data literacy skills can be inferred in many 
of the outcomes focused around its seventh 
outcome group, “Experiments.” The relevant 
outcomes are 
e Identify the procedures . . . to conduct 
civil engineering experiments 
e Explain the purpose, procedures . . . of ex- 
periments 

e Conduct experiments . . . according to es- 
tablished procedures 

e Analyze the results of experiments (ASCE, 
2008, p. 106) 


Data literacy can also be inferred from the 
outcomes regarding communication (BOK 2, 
Outcome 16), which call for students to “use 
appropriate graphical standards in preparing 
engineering drawings” and “[o]rganize and 
deliver effective . . . graphical communica- 
tions” (ASCE, 2008, p. 110). It can be read 
as part of Outcome 13: Project Management, 
if the new standard procedure for conducting 
experiments includes creating a plan to man- 
age data, including organization, security, and 
preservation (now mandated by some funding 
agencies). 

The engineering field, more widely, shares 
this opacity of expectation with regard to data 
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management. The outcomes suggested in the 
BOK 2 echo those already implemented by the 
Accreditation Board for Engineering and Tech- 
nology (ABET) in their outcome, “an ability to 
design and conduct experiments, as well as to 
analyze and interpret the data” (ABET, 2012, 
General Criterion 3[b]). 

Locally, UMN students and faculty re- 
ceive somewhat varied and inconsistent DIL 
training. For example, the university requires 
all principal investigators (PIs) of grants to 
complete one of two Web-based instructional 
modules on the “best practices of research in- 
tegrity” (University of Minnesota Research 
Education and Oversight, 2014). These mod- 
ules cover some aspects of data control and 
intellectual property concerns. However, these 
responsible conduct of research (RCR) mod- 
ules are only required for PIs and are not well 
described or discoverable to those looking for 
just-in-time data management education. Be- 
ginning in 2010, researchers could supplement 
that training with workshops taught by the li- 
braries on “Creating a Data Management Plan 
for Your Grant Application” or “Introduction 
to Data Management for Scientists and Engi- 
neers,” available as drop-in library workshops 
and online video recordings (University of 
Minnesota Libraries, 2014). The former work- 
shop reached more than 300 faculty members 
and is offered for RCR continuing education 
credit (Johnston, Lafferty, & Petsan, 2012). 
However, both RCR training and library-led 
workshops were designed specifically for fac- 
ulty PIs and therefore do not target the gradu- 
ate student population. 

It is possible that data management skills 
are being addressed, along with other informa- 
tion literacy competencies, in student research 
experiences such as undergraduate research op- 
portunities programs, research assistantships, 
or cooperative educational programs, but the 
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literature on information literacy has focused 
primarily on information retrieval skills (Jef 
fryes & Lafferty, 2012). One student in our 
study mentioned receiving some data manage- 
ment skills in an introductory research meth- 
ods class, but considered it too early in her stu- 
dent career to be useful to her current research 
project. The current integration of data man- 
agement skills into the graduate curriculum is 
neither constant nor at the point of need. 

The DIL team also investigated the cur- 
rent data management best practices used by 
the discipline locally. One of the graduate stu- 
dent subjects worked in the Multi-Axial Subas- 
semblage Testing (MAST) Laboratory, which 
provided explicit best practices for data man- 
agement and support for data upload to the 
national NEEShub data warehouse, a National 
Science Foundation-funded data repository 
for earthquake engineering data. The other stu- 
dents in the study population did not receive 
documented support or management guidance 
during their research. 

Data repositories, examples of curated data, 
and management protocols exist for some sub- 
disciplines relevant to the work conducted by 
the research population. The student work- 
ing with the MAST Laboratory was required 
to post her data into NEEShub. Although 
the other researchers were not connected to a 
specific data repository, Table 7.1 provides ex- 
amples of metadata schemas and requirements 
that researchers in structural engineering might 
encounter. 

We discovered documentation and training 
opportunities provided by these bodies through 
Internet searches. Overall we found two disci- 
plinary leaders within structural engineering, 
NEES and NISEE, both of which focus on 
the curation of earthquake engineering data 
(NEEShub, 2009; Thyagarajan, 2012; Van Den 
Einde et al., 2008; Wong & Stojadinovic, 2004). 


METHODOLOGY 


The UMN team interviewed the members of a 
structural engineering research group consist- 
ing of one faculty member and four graduate 
students ranging in experience from a first- 
year graduate student to a student in her final 
semester. The interview instrument, based on a 
modified version of the Data Curation Profiles 
Toolkit instrument (available for download at 
http://dx.doi.org/10.5703/1288284315510), 
allowed us to gather detailed information 
about the practices, limitations, needs, and 
opportunities for improving DIL practices 
from the perspective of both the faculty mem- 
ber and graduate students in the subject area. 
We collected and evaluated relevant docu- 
mentation, including data set examples and 
supporting research practices. 

The interviews took place between March 
13, 2012, and April 20, 2012. These struc- 
tured, 1- to 2-hour interviews took place in 
a library conference room using two audio re- 
corders each producing a file that a graduate 
assistant transcribed for analysis. The inter- 
view comprised two components: a worksheet 
that participants filled out and a list of follow- 
up questions that were asked of interviewees 
based on their responses from the worksheet. 
The data we collected, including the sample 
of the research data provided by the research 
group, the interview transcripts and audio 
files, and the interview worksheets, were ano- 
nymized, compiled into a Microsoft Excel file, 
and analyzed. 


RESULTS OF THE NEEDS ASSESSMENT 


‘The interviews provided a snapshot of the DIL 
skills needed for structural engineering gradu- 
ate students at UMN. ‘The analysis revealed 


TABLE 7.1 
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Data Repositories Identified in the Disciplinary 


Environmental Scan of Civil Engineering 


Repository Location 


NEEShub (earthquake 
engineering) 


NISEE (earthquake engineering) 


DARPA Center for Seismic Studies Arlington, Virginia 


several needs at various stages throughout the 
data life cycle. It was clear that the students had 
no formal training in DIL. Students reported 
collecting various types of data, but primarily 
data from sensors placed on the bridges they 
were evaluating, to study bridge integrity fac- 
tors. The lab works with and receives funding 
from national and state agencies to conduct its 
research projects. These project partnerships 
have a noticeable effect on the treatment and 
handling of the data. The student working 
within NEES was expected to share data via 
the processes and standards for sharing and cu- 
rating data developed by the NEES repository. 
The state agency, on the other hand, claimed 
ownership over the data and required approval 
before the data could be shared. Although 
the work of the lab was influenced by the ex- 
pectations of its external partners, no formal 
policies or procedures (for documenting, or- 
ganizing, or maintaining data) existed in the 
lab itself. As a result, individual students ap- 
proached data storage and management in dif- 
ferent ways. The faculty researcher expressed 
concern about students’ abilities to understand 
and track issues affecting the quality of the 
data, to transfer the data from their custody 
to the custody of the lab when they graduated, 
and to take steps to maintain the value and 
utility of the data over time: “The skills that 


Purdue University 


University of California, Berkeley 


URL 


hitp://nees.org 


http://nisee2.berkeley.edu 


hitp://gemd.nasa.gov/records 
/GCMD_EARTH_INT_SEIS 
_CSS_01.html 


they need are many, and they don’t necessarily 
have it and they don’t necessarily acquire it in 
the time of the project, especially if theyre a 
Master’s student, because they’re here for such 
a short period of time.” 

We asked the participating faculty and stu- 
dents to indicate the importance for graduate 
students to become knowledgeable in each of 
the 12 competencies of DIL, by using a 5-point 
Likert scale, and then to explain their choices. 
Interviewees identified additional skill sets they 
saw as important for graduate students to ac- 
quire (see Figure 7.1). 

In the course of interviewing the graduate 
students, certain steps in the data life cycle 
were present regardless of the research project, 
though the students did not use a consistent 
vocabulary when describing these steps (see 
Table 7.2). 

To analyze the skills and needs described in 
the interviews, we reviewed the results in the 
context of each of the stages of the data life 
cycle. Although the students did not explic- 
itly identify preservation as a step in their data 
life cycle, they mentioned critical aspects of 
this topic throughout the results phase. These 
observations provided a foundation for a gen- 
eralized approach to understanding the data 
interactions of structural engineering graduate 
students in a research group. 
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e=Civil Engineering Faculty Member (n = 1) 


«eee Average Graduate Student Response (n = 4) 
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Interoperability 


Data Processing and 


Ethics and Attribution 


Cultures of Practice 


Metadata and Daf 
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Analysis 
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ata Management and 
Organization 
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Figure 7.1 The rating of DIL skills by the UMN faculty member and the average 
graduate student response. Scale: 5 = essential; 4 = very important; 3 = important; 
2 = somewhat important; 1 = not important. (Note: The faculty member did not rate 


Discovery and Acquisition.) 


Stage 1: Raw Data 
In the first module of the interview we asked 
the graduate students to describe the type of 
data with which they worked. All graduate stu- 
dents reported using sensor data as the crux of 
their research projects. Three out of the four 
graduate students collected data for projects 
that generated real-time sensor data to moni- 
tor the performance of local bridges, while one 
graduate student generated experimental data 
and simulations on concrete column perfor- 
mance in simulated earthquake conditions. 
Although the expectations of their external 
partners influenced the work of the lab, the 
lab itself did not have formal policies or proce- 
dures in place for documenting, organizing, or 
maintaining their data. As a result, individual 
students approached data storage and manage- 
ment in different ways. The faculty researcher 
expressed concern about his students’ abilities to 


understand and track issues affecting the qual- 
ity of the data, to transfer the data from their 
custody to the custody of the lab upon gradu- 
ation, and to take steps to maintain the value 
and utility of the data over time. For example, 
the faculty interview highlighted the need for 
students to understand the potential hazards 
of collecting “bad” data. The faculty member 
thought that having a better understanding of 
how sensors collect data might help. Several 
students mentioned knowing about potentially 
disruptive elements such as temperature con- 
ditions or scheduled construction/testing that 
might impact their data; however, their pro- 
cesses and documentation did not merge these 
events with the data they collected. 


Stage 2: Collection and Organization 
In discussions regarding data collection and or- 
ganization, more trends emerged: 


Teaching Civil Engineering DIL Skills CHAPTER7 155 


TABLE 7.2 Data Life Cycle Stages as Described by the Case Study Graduate Students 


Student Initial Second 
Grad #1 Raw sensor Processed data 
data 
Grad #2 Raw Excel 1 
Grad #3 Raw numbers Organization 
Grad #4 Data download Organize 
from a data into 
website test folders 
and regular 
activity of 
bridge folder 
1. Raw data 2. Collection 
and 


organization 


e Students used date-based file-naming 


structures, even when they weren't fa- 
miliar with the concept of a file-naming 
structure. As one student remarked: “I’ve 
never even heard of a file naming system.” 
Students did not consider data security 
an issue and felt that they had adequate 
protections in place. 

Backup of their data was often spo- 
radic or nonexistent. Two of the stu- 
dents displayed some confusion about 
the concept of data backup versus data 
redundancy. For example, one student 
described her backup process as copying 
files to a separate folder on her desktop 
(which would not protect against theft 
or computer damage). 


Student Response 
Third Fourth Fifth 
Processed with Comparison Share the data 
figures (with other (stages 1 
research) and 2) 
Excel 2 Stress Final Excel file 
calculation/ 
force and 
moment 
calculation 
Analysis and 
conclusion 


Analyze data 


Create alarms 
to warn of 
potential 
problems on 


the bridge 


Data Stage 


3. Processing 
and analysis 


A. Results 5. Sharing and 


archiving 


Students agreed that they had no formal 
DIL instruction but had to rely on their 
peers, family, and previous experience for 
direction. As one student described: “I’ve 
had many projects with Excel files and 
stuff that I’ve needed to save, and I guess 
I learned [data management] just out of 
habit, mainly.” 


Students used formal and informal docu- 


mentation practices to record the data collec- 


tion process, and changes made to the data 


were ad hoc and varied. For example, while 


some students labeled columns in Excel, ad- 


ditional information, such as the bridge sen- 


sor locations, were in multiple locations and 


separate from the data files (e.g., in e-mail 
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correspondence or schematic drawings). Most 
of the students did not have an understanding 
of the concept of metadata. Only one of the 
graduate students was familiar with the term, 
and when asked to define it the student replied, 
“It means data captured and saved during the 
test.” The other students all responded nega- 
tively when asked if they were familiar with the 
term. Regardless, all of the students provided 
some level of metadata to the data they were 
working with, but the majority were not col- 
lecting or applying it in an intentional or for- 
mal manner. 

When asked if they had any means of docu- 
menting the steps for someone else to repeat, 
the students described the inefficiencies of their 
own system. One student admitted, “I guess if 
I were to repeat [the research project], I would 
probably do it in a different way. I could prob- 
ably document what I’ve done and I probably 
will do so, but then I'll also suggest maybe 
keeping things a little less complicated.” 


Stage 3: Processing/Analysis 
Each of the graduate students described a pro- 
cess for analyzing, visualizing, and making con- 
versions of the data beyond the original raw 
data stage. The majority of the graduate stu- 
dents spoke of a process of converting ASCII 
text files into Excel for further manipulation 
and sense making. One graduate student used a 
proprietary sensor program that allowed for data 
manipulation within her Web-based software. 
Regardless of format, they described a process 
of further manipulation of the data, such as re- 
moving “bad” data (i.e., bridge sensor readings 
contaminated due to noise during construction), 
synthesizing the rough data using equations, and 
creating graphical representations of the data 
(“plotting”), all to better communicate findings. 
The faculty member held the graduate stu- 
dents’ facility with Excel and MATLAB in high 


esteem, but had some concern that students 
werent receiving all the support they needed in 
more advanced data analysis, saying: 

It’s the relational databases . . . and their ca- 
pabilities for statistical analysis that are a little 
weak. And there are courses they can take on 
campus for the statistical and the relational 
databases, so maybe it’s something that we 
should be requiring. The problem is that if 
they’re going to do a Master’s thesis, they take 


only seven courses. 


He echoed the sentiment for further devel- 
opment of student skills in this area by noting 
that students would benefit from further edu- 
cation on the strategy behind data plotting. His 
ideal would be for graduate students to demon- 
strate an “ability to take the data and come up 
with a way of conveying it so that the reader 
can pick it up very quickly.” Indeed one stu- 
dent described his process of creating data vi- 
sualizations in Excel as “mostly trial and error.” 

The faculty member also specifically called 
out the need for students to be able to identify 
and track the quality of the data they were col- 
lecting when it may have been compromised 
by outside forces, such as with construction 
on the bridge where they collected sensor data. 
The professor commented that the students 
werent currently tracking this aspect of their 
data analysis in the documentation, but “it 
would be nice, especially when they’re collect- 
ing huge amounts of data, if we could some- 
how get measures of the quality of the data, 
statistically. And if we could use these measures 
to keep track of getting good data and when 
were not getting good data.” 


Stage 4: Results 
During discussions about ensuring long- 
term access to the data collected, numerous 
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preservation concerns arose. Several issues were 
not addressed in the research group, such as 
physical storage (e.g., desktop computers used 
by graduate students would eventually be re- 
cycled) and file migration (e.g., use of a propri- 
etary and future incompatible version of Excel) 
for data stored in the lab. 

Students were unclear about whose respon- 
sibility it was to preserve the data for long-term 
access. Additionally, they were unclear about how 
to preserve data for 20 to 50 years, or the life of 
the bridge. For example, one student suggested 
that the contracting state agency held the respon- 
sibility for preserving the data and that the agency 
would keep the data “forever.” When asked to 
identify the steps needed to preserve the data and 
if the state currently implemented those steps, 
the student responded: “I think that’s just sort of 
what they do. . . . [B]ecause they've had issues in 
the past where people have completed projects 
and then others have wanted to repeat them or go 
more into depth with them and then havent been 
able to find any of the original data for it, . . . I 
think that’s kind of just their policy.” When asked 
for steps to preserve the data set the graduate stu- 
dent responded, “Just putting [it] onto that hard 
drive and making sure it doesn’t melt I guess.” 

In our conversation with the faculty mem- 
ber, the issue of data versioning for long-term 
access and preservation arose. Along with iden- 
tifying and implementing steps to preserve and 
store data for the long term, researchers must 
choose which versions of their data should be 
preserved for future use and authenticity. The 
professor responded to the issue of versions: 


This is an interesting problem. There are ac- 
tually multiple stages and multiple things 
that you do [to the data], and so how many 
data sets do you store? Clearly, you want the 
raw data. That’s the purest form. And clearly 
you want the data that you think has been 


completely digested as you think it needs to 
be. But how many of the intermediate stages 


do you want to keep? 


Stage 5: Sharing and Archiving 

Each of the four students shared his or her data 
results in some way. One student shared her 
data in a formal process through the manda- 
tory data archiving protocol of the NEEShub 
program, while the other students shared their 
data with state contractors, their advisor, and 
the graduate students continuing the project. 

Although students had little to no expe- 
rience with data citation, when asked their 
thoughts on its importance, they reported an 
understanding of the value of this practice. A 
student explained: “Because you need to know 
where this data is coming from, and obviously 
if it’s not your own, then I feel like it’s impor- 
tant to make other people aware that it is not 
data that you actually collected yourself.” 

As to the potential for other researchers to 
reuse their data, only one student felt that his 
analyzed data was unique and therefore of po- 
tential value. The other students had a harder 
time imagining how their data might be useful 
to researchers outside of their specific project. 
The graduate students demonstrated little to 
no knowledge of data repositories in their field 
or experience using another researcher’s data 
from outside their lab. One student mentioned 
that looking at another researcher’s data in the 
literature review led to his experiment, but he 
found the data by chance and the repository 
was not a standard destination. 

The graduate students did not see the value 
in archiving similar data sets together in a 
subject-based repository structure. Referenc- 
ing the Interstate 35W bridge in Minneapolis, 
which was rebuilt after the tragic 2007 col- 
lapse with sensors measuring strain in a similar 
way to the data obtained by our interviewee, 
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the student noted, “Unless you could come up 
with some good way to compare the two sets 
of data, I don’t know really what use it would 
be to collect them all into one place.” The stu- 
dent did see the value of data repositories to 
save on space, however, so that “there arent 50 
external hard drives floating around.” 

Issues around privacy and confidentiality 
were a complex topic for students working on 
a state-contracted project analyzing bridge sen- 
sor data. Students knew to contact their advisor 
with requests to share the data owned by the 
state agency. One student described her cau- 
tion with presenting the state-funded data re- 
sults at a conference: “I had to get permission 
from [the state contractor] first before I could 
even do that.” However, the reasons beyond 
“ownership” were unclear. The faculty member 
was able to explain the sensitive nature of the 
data when asked if the state agency had any 
specific interests in sharing this data beyond 
the agency. The professor replied: 


That’s a really good question. They would like 
to share data, as long as they can protect their 
interests. And I don’t mean any advantage in 
having that data. What they're afraid of is this 
data represents measurements that are taken 
off of real bridges, and that can very easily 
be misinterpreted and used to undermine a 
bridge that’s actually not in bad shape, and 
then present a bloated and incorrect scenario 
about how bad the bridge problem is. Or 
the claim that a bridge is in great condition, 
when in fact it needs to be replaced. For that 
reason, they are very, very, very unwilling to 


have anything like open access. 


All Stages 
With our findings, the UMN team developed a 
list of skills needed by graduate students in this 


discipline. These are detailed in Appendix A to 
this chapter. 


E-LEARNING APPROACH TO TEACHING 
DATA INFORMATION LITERACY SKILLS 
TO GRADUATE STUDENTS 


‘The benefits of taking an e-learning approach to 
educating graduate students are enumerated in 
the literature reviews and discussions of many 
studies (Gikandi, Morrow, & Davis, 2011; 
Safar, 2012). The U.S. Department of Educa- 
tion (2010) in its meta-analysis of the literature 
found that “students in online conditions per- 
formed modestly better, on average, than those 
learning the same material through traditional 
face-to-face instruction” (p. xiv). Gikandi, 
Morrow, and Davis's review of formative assess- 
ment in online learning, citing the influence of 
Oosterhof, Conrad, & Ely (2008), posited that 
online learning benefitted students by provid- 
ing instructors “many additional opportunities 
to dynamically interact with and assess learners” 
(p. 2333). Gruca (2010) nicely outlined ben- 
efits of libraries’ adopting e-learning platforms 
to deliver their instruction. Most resonant with 
our experience was her assertion that “e-courses 
are equally accessible for full-time and remote 
students and may be a step towards inclusion 
for disabled students” (Gruca, 2010, p. 20). 
We wanted our instruction to be as accessible 
as possible to graduate students who carried a 
full course load as well as a time-intensive re- 
search schedule. Although Gruca (2010) never 
explicitly used the phrase, many of the benefits 
of e-learning she listed support the scalability 
of instruction inherent in an e-learning plat- 
form. Gruca stated that e-learning “saves teach- 
ers and students’ time” and “[o]nce published, 
an e-course may be improved and used many 


times” (p. 20). The ability to scale would be 
integral to ensuring expansion of our work at a 
university where we support tens of thousands 
of students. 


Learning Objectives and Assessment Plan 


Conceptualization and creation of the course 
took place over the summer of 2012. Table 7.3 
shows the learning outcomes for each module 
of the course. 

In the course design phase of the project, we 
met with the faculty partner to vet the learn- 
ing outcomes and strategize on connecting 
students to our course content. Because the 
graduate-level curriculum was already quite 
full, the approach had to be a voluntary, ex- 
tracurricular program for students. The online, 
e-learning format was clearly a good fit. In ad- 
dition, modularized video lessons would be 
easy to download and watch on any device that 
matched the busy graduate student lifestyle. 
The syllabus is in Appendix B to this chapter. 

We thought the course needed a real-world 
application in which the students might dem- 
onstrate or test their newly acquired skills. 
Therefore, building on our earlier success offer- 
ing data management training to researchers, 
we chose to use a DMP template as the framing 
device for course content delivery and evalua- 
tion. Each of the seven course modules mapped 
to a corresponding section of a DMP template 
where the student directly applied what he or 
she learned in the course. (See Appendix C to 
this chapter for a DMP template.) The result- 
ing seven course modules became 


1. Introduction to Data Management 

2. Data to be Managed 

3. Organization and Documentation 
Methods 

4. Data Access and Ownership 
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5. Data Sharing and Reuse 
6. Data Preservation Techniques 
7. Completing Your DMP 


Although data analysis and visualization 
skills came up in our interviews with faculty 
and students, we chose not to include them be- 
cause the librarians did not have the expertise to 
teach them. As an alternative we added a page 
to our course website pointing students to local 
and freely available resources and training. 

At the outset of our course design we decided 
that our guiding principle for creating online 
instructional modules would be to “utilize 
preexisting content.” With that philosophy in 
mind our first step was to find content openly 
available for reuse, including video, images, and 
e-learning tools that covered any of our data 
management topics. A library science practi- 
cum student helped review relevant content. 
We discovered many sources labeled for reuse, 
including professional library-generated tuto- 
rials such as MANTRA (http://datalib.edina 
.ac.uk/mantra), a UK-based data management 
skills support initiative, as well as informal You- 
Tube videos and cartoons. We embedded sev- 
eral of these through the modules after receiv- 
ing permission from the authors. In addition, 
we customized content from the in-person data 
management workshops that the UMN librar- 
ies have offered to focus on the particular needs 
of structural engineering graduate students. 

To create the modules we wrote scripts, cre- 
ated slides, and recorded videos for each of the 
seven topics. The scripts were written to incor- 
porate a logical flow of the information and to 
set up the student to respond to each learning 
outcome. Next, we built a slide deck in Micro- 
soft PowerPoint and then captured the screencast 
presentation with voiceover using ScreenFlow 
(http://www.telestream.net/screenflow/overview 
.htm), an Apple-based video recording software. 
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TABLE 7.3 Descriptions and Learning Outcomes of the Seven Modules in the 


UMN Data Management Course 


Course Module Brief Description 

In this module we introduce the 
concept of data management 
using an example from the 
academic discipline 


1. Introduction to 
Data Management 


2. Data to Be 
Managed 


This module helps students 
define what information will be 
managed, document the data 
collection process, and create 
a plan to store, back up, and 
securely house these data 


3. Organization This module helps students plan 


and for how to organize their data, 
Documentation track versions, create metadata, 
Methods and document data collection 


for reuse 


In this module we illustrate some 
of the intellectual property and 
access concerns that researchers 
face when sharing their data 
with others 


A. Data Access and 
Ownership 


In this module we describe the 
benefits of data sharing and 
potential for reuse as well as 
introduce students to the concept 
of data publishing and citation 


5. Data Sharing 
and Reuse 


Learning Outcomes (Students will . . .) 


Describe the benefits of data management 
to explicitly understand the benefits of 
participating in the course 

Articulate what they will get out of this 
program to reinforce the learning outcomes 
of the curriculum 


Create a data inventory for their research 
project (e.g., data, project files, 
documentation) to not overlook any aspects 
of their DMP 

Write a backup and storage plan to avoid 
potential loss of data 


Plan an organizational structure for their 
data using a file naming system and 
directory structure that is well-documented 
and interoperable with other data sets 
to decrease versioning issues and data 
duplication 

Articulate a plan to collect and share the 
supplementary data points of their research 
to assist other researchers in making sense 
of their data 

Fill out a metadata schema example for their 
data to model ideal metadata practices 


Name the stakeholders of their data to 
understand the potential intellectual 
property and ownership concerns with 
releasing their data to a broader audience 

Report potential access concerns with their 
data to plan for the appropriate access 
controls 

Identify potential access controls to secure 
their data prior to release 


Name the audience for whom the data will 
be shared to customize the documentation 
and format for potential reuse 

Explain an approach they will use to share 
the data to instill best practices for their 
future data sharing 

Cite their data in a properly structured format 
in accordance with emerging standards to 
prepare them to ethically reuse data in the 
future 


Continued 
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TABLE 7.3 Descriptions and Learning Outcomes of the Seven Modules in the 


UMN Data Management Course —cont d 


Course Module Brief Description 
6. Presevation 


Techniques preservation and curation 


techniques used by information 


professionals who manage 


digital information for long-term 


access 


7. Complete Your 


DMP on how to complete and 


implement their DMP within their 
lab, research group, or future 


project 


ScreenFlow was chosen because it allowed us to 
capture and edit existing YouTube videos that 
we embedded in PowerPoint presentations and 
included in our modules. ScreenFlow also pre- 
sented a relatively easy-to-learn editing interface 
over alternative software such as Apple iMovie or 
Adobe Captivate. After creating the videos, we 
uploaded them to a YouTube channel to allow us 
to link or embed them into content platforms. 
YouTube also facilitated closed captioning of the 
videos, making them more accessible to a variety 
of learners. 

The video content was organized on a Google 
Site as the course home page at http://z.umn 
.edu/datamgmt (see Figure 7.2). The Google 
Site allowed us to create separate Web pages 
for each module, which includes the following 
components: 


e Text descriptions of each module's learn- 
ing outcomes 

e Instructional video (embedded from 
YouTube) 

e Assignment (links to the students DMP 
template) 


In this module we introduce the 


This final module instructs students 


Learning Outcomes (Students will . . .) 


Explain the life span of potential use for their 
data to recognize the long-term value of 
their data 

Identify the relevant preservation-friendly 
file format for their research data to ensure 
long-term access to their digital information 


Map out an implementation plan to put their 
DMP into action. 

Identify the components of a DMP to repeat 
the process with future research activities 


e Links to additional resources (if appli- 
cable) 

e Cartoon illustration of a relevant data 
management concept 


The course site is open to the public. 
We choose Google Sites over other campus 
e-learning tools due to the ease of creation, dis- 
coverability, and potential for one-click “clon- 
ing” if the library adapts the course in future 
semesters or for disciplinary sections beyond 
civil engineering. 

Beta testing of the e-course revealed several 
minor errors and inconsistencies with the video 
modules and website. The test users were pri- 
marily UMN librarians and members of the 
DIL grant project. ScreenFlow allowed for quick 
video edits and insertions while the written-out 
scripts proved easy to edit and rerecord. 

To assess the success of the instructional 
intervention we used a three-pronged assess- 
ment plan including formative and summa- 
tive assessment techniques. Throughout the 
course students would take the information 
covered in the individual modules and apply it 
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Data Management Course 


Dasa Management 


Course 1. Introduction to Data Management 


Modules 
= Taking the time to articulate a plan for managing your data management at the beginning 


FR 01 « research project can pay dividends inthe long-term. Having a plan for organizing 


we storing, securing, and sharing your data can provide benefits such as: 


© Saving time 

© Reducing confusion 

e Facilitating sharing 

< Complying with funder mandates 


At the end of this module you will be able to 


e describe the benefits of data management, and 
e articulate what you hope to get out of this online course 


Additional Resources 


* Managing Your Data at UMN (a University Libraries Guide) 


Comments 


directly to their own research project through 
the creation of a DMP. The instructors cre- 
ated a unique copy of a DMP template that 
they shared with students via Google Drive 
(see Appendix C to this chapter) upon their 
enrollment in the course. We used the com- 
pletion of the DMP template as a formative 
assessment throughout the course. Oosterhof, 
Conrad, and Ely (2008) described formative 
assessment as “those [assessments] that occur 
during learning,” analogous to “what a men- 
tor does continuously when working with 
an apprentice” (p. 7). The different modules 
strategically mirrored the DMP template. 
This design made it easy for students to create 
a real-world application. Since the students’ 
DMP document was shared with the two in- 
structors via Google Drive, we could check on 
the students’ understanding periodically and 
provide feedback via the “Comment” feature. 
This form of assessment allowed us to gauge 


Figure 7.2 

Screenshot of a module in 
the 2012 Data Management 
Course: Structures Section. 


student understanding in an organic way that 
would seem relevant to the students. 

For the second prong of our assessment 
plan, we sent a course satisfaction survey im- 
mediately to students who had completed the 
course (see Appendix D to this chapter). These 
responses provided a summative view of each 
student’s experience in the course. The instruc- 
tors learned which aspects of the instructional 
approach were effective, and which needed fur- 
ther improvement. 

The third prong measured the long-term 
impact of the course via an online survey 
that we sent out 6 months after the comple- 
tion of the online course (see Appendix E to 
this chapter). This assessment was to show 
us whether completing the course impacted 
students’ practice of managing research data. 
This form of assessment showed us whether 
the students successfully moved through the 
“hierarchical order of the different classes of 
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earch ime sne 


RM UNIVERSITY OF MINNESOTA 
Data Management Course 


Data Management 
Course 


Modules 

1. Introduction to Data 
Management 

2. Data to be Managed 


3. Organization and 
Documentation 


4, Data Access and 
Ownership 


5. Data Sharing and 
Re-use 


6. Preservation 
Techniques 


7. Complete Your OMP 


News & Updates > 
Free online Data Management short course for students and staff 


posted Jan 25, 2013, 913 AM by Lisa Johnston [updated Jan 23, 2013, %14 AM] 


Dear Lab Manager or Director, 


Do you have a plan in place to manage your research data? Not only do grant funders require it** but interviews with current UMN graduate students show 
that they are not getting formal training in skills they need to properly document, store, and archive the digital research data they generate for long-term use. 


The University Libraries invite your research staff to participate in a new online workshop that will introduce the fundamentals of data management. As an 
outcome of this course, each participant will create a detailed data management plan (DMP) in order to address the specific and long-term needs of managing 
research data in their lab or research group. This plan can be used for your grant applications and transferring research projects to future graduate students 
in the your group. 


The course syllabus (seven-modules each consisting of a short, five-minute video) was developed using the experiences of graduate students working in a 
civil engineering lab on campus and is geared toward graduate students and lab researchers currently working with research data. 


There are two ways to aet your research staff involved: 


1. Sign up at http:/iz.umn.edu/datamamt and complete the online course individually before May 10, 2013. The course could take as littie as a few 


2. Schedule an in-person session with the instructors to take the course as a group and participate in an active hands-on session to write your plan. 


** Leam more about funding agency (ie. NSF, NIH) requirements for data management plans at https://www.lib.umn.edu/datamanagement/funding 


Resources 
Data Analysis hours of your time followed by expert feedback on your plan by the instructors. 
UMN Services 
Let us know if you have any questions, 
About Lisa Johnston 
Spring 2013 Data Management Course Instructors 
Instructors 
News & Updates 
Syllabus 


Figure 7.3 Announcement and e-mail invitation to participate in the 2013 spring Data Management 


Course. 


objectives” found in Bloom’s taxonomy, from 
knowledge, to comprehension, to application, 
to analysis, to synthesis (Bloom, 1956, p. 18). 
As Bransford, Brown, and Cocking (1999) 
stated in a report on the science of learning, “It 
is essential for a learner to develop a sense of 
when what has been learned can be used—the 
conditions of application” (p. xiii). 


Results of the Fall 2012 
and Spring 2013 Course 


At the end of the first week of the fall 2012 se- 
mester the two library instructors discussed the 
data management course during the Civil En- 
gineering Structures Seminar, a required course 
for all the graduate students in the “structures” 
track (around 20 students). We focused on 
why data management is important. At the 
end of the session the students completed a 
“|-minute paper” explaining how they thought 
a DMP would benefit their research. Subse- 
quently, 11 students enrolled. The students 
controlled their own progress through the 


course. The instructors sent e-mails three times 
throughout the semester to nudge students to 
participate: once at the semester’s midpoint, 
once a week before the course deadline (the last 
Friday of classes), and on the day of the dead- 
line of the course. The instructors periodically 
reviewed the DMPs of the enrolled students in 
Google Drive to provide feedback. There was 
no progress on the templates until late in the 
semester. 

In the spring semester, we scaled the course 
to reach other researchers across our campus. 
We built the course so it would be relatively 
easy to replace the discipline-specific content 
with that of other research areas. In the spring 
of 2013, the instructors sought the help of 6 
subject librarians, liaisons to the engineering 
and other science disciplines on campus. With 
their help, we opened the course to graduate 
students from other engineering and science 
disciplines (see Figure 7.3). There were 47 en- 
rollees from 14 departments. No introductory 
session was offered in person as it had been in 
the fall due to the wide variety of students. 
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The spring course was similar to the fall se- 
mester course, except that liaison librarians, 
not the original course authors, sent periodic 
e-mail reminders to engage the students. Mid- 
way through the course, we offered an in- 
person 2-hour workshop that delivered all of 
the course material in a single, collaborative 
environment. Instead of working through the 
seven Web-based modules on their own, stu- 
dents could attend the workshop and ask ques- 
tions and get feedback in class. They could 
learn from peers and discuss the practical ap- 
plication of data management with them. Thir- 
teen students attended this session. 

Course completion included not only 
watching the video modules (or attending the 
in-person session) but also completing a DMP. 
The plan had to be submitted to instructors for 
feedback before the course could be considered 
complete. At the end of the fall semester only 
2 out of 11 students had completed the DMP 
template. Five students asked for extensions or 
permission to defer their enrollment into the 
next semester. The reasons for postponing in- 
cluded heavy workloads and lack of an actual 
data set to apply the principles covered in the 
videos. Three of those 5 students who chose to 
defer successfully completed the course in the 
spring, bringing the fall course completion rate 
to 5 students (a 45% completion rate). In the 
spring, 6 out of the 47 students who signed 
up successfully completed the course by turn- 
ing in a written data management plan (13% 
completion rate). Overall, we ended the 2012- 
13 academic year with a total of 11 graduate 
students completing the course. This is a 19% 
completion rate for an online, non-required 
class—higher than that for most MOOCs 
(massive open online courses), which accord- 
ing to Parr (2013) is about 7%. 

We sent a four-question survey to all 11 
students once they finished the course, along 


with a certificate of completion for their UMN 
training history. Seven students (64%) com- 
pleted the survey and demonstrated a high 
level of satisfaction. One student summed up 
the course: 


This course gave me good techniques which 
I will not only be able to implement in my 
current research in addition to what I have 
already been doing, but also use them in the 


rest of my career. 


We received five (45%) responses to the 
6-month follow-up survey. The questions mir- 
rored the seven module topics of the course 
and the primary learning objects for each mod- 
ule. Overall the results and comments were 
very positive. Comments also demonstrated 
understanding of some of the primary learn- 
ing objectives of the course—for example file 
naming and metadata schemas as illustrated by 
this comment: 


Some forethought on naming and metadata 
conventions goes a long way when managing 
data. This aspect of the course was very im- 
portant and I have tried to employ it as often 
as possible. I sense that many students and 
possibly some researchers/professors don’t 
commonly use a clear naming structure or 


metadata schema. 


Comments also highlighted some surpris- 
ing aspects of the course that students did not 
find relevant. For example, data ownership 
and access: 


This aspect of the class was also very thought 
provoking but isn’t quite as relevant to my 
data. However, I am involved with many 
projects that have multiple organizations 


with interest in common data and so, some 
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forethought on data ownership will help clar- 
ify who is in charge of this data and how to 


process/pass it along. 


DISCUSSION: LESSONS LEARNED 
FROM THE E-LEARNING APPROACH 


Our two semesters proved to be learning expe- 
riences in the presentation of this course. We 
applied key lessons from the first iterations of 
the e-learning approach, which included con- 
necting to actual student data sets and provid- 
ing generic simulations, as well as incentivizing 
the course to ensure completion. 


Connection to Actual Data Sets 

We attempted to make this course applicable 
by tying course content to the actual work 
students were doing in their labs. Therefore, 
students had to have their own research data 
to make the course useful. But many of the 
students interested in the course were not far 
enough along in their program to have started 
collecting data for their project. In the in- 
person workshop we included an example of a 
completed DMP that provided students with 
a data set and a model they could follow when 
constructing their own plans. An approach to 
consider for students who do not have a re- 
search project is to provide a generic simulation 
to which students could apply the principles 
addressed in the video modules. 


Ensuring Completion 

Although a large number of students enrolled 
in the course, the completion rate was low. In 
the first iteration of the course a certificate of 
completion was used as a prompt for comple- 
tion (on the advisement of our faculty partner), 
but only 2 of 11 students completed the course 


(though 5 more asked to defer their comple- 
tion). We are considering promoting the course 
through principal investigators and lab advisors. 
We learned many lessons from implement- 
ing an online instruction model for teaching 
DIL. For example, we believed that our ap- 
proach would allow busy graduate students to 
engage in supplementary 


materials on their own 
time. However, setting 
aside time to self-educate 
proved to be a major 


We applied key 
lessons from the first 
iterations of the 


e-learning approach 


that included 


connecting to actual 


hurdle for students. The 
response to the optional 
workshop showed that student data sets and 
students were willing to providing generic 
attend training in per- simulations, as well 
son because it provided a as incentivizing 
structure for completion. the course to ensure 
As one student stated: “I 


really liked the in-person 


completion. 


lecture. Made it easy to set aside one block of 
time to go through all the information and 
have staff on-hand to answer questions.” 

Therefore, in response to these findings we 
changed the pedagogy of the course in fall 
2013 to a “flipped course.” Participants in the 
workshops met for 1-hour sessions once a week 
for 5 weeks. Students watched an online video 
before attending the corresponding hour-long 
hands-on workshop. In class we used fictional 
data scenarios from a wide range of disciplines 
to introduce students to practical aspects. To 
encourage completion, we offered participants 
who attended all five data management work- 
shops a certificate of data management training 
for their UMN training records. Developing a 
written DMP was optional. 

The first offering of the flipped course was 
a success. To accommodate the number of 
students interested in attending, the library 
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offered two classes for each of the five sessions. 
Eighty-three students enrolled in at least one 
of the five sessions. Attendance was a little over 
50% on average for the series. Sixteen students 
(33% of attendees) completed all five sessions 
and received a certificate of data management 
in their UMN training history. 


CONCLUSION 


The results of this case study have been used 
to develop and implement several variations 
of online and flipped classroom instructional 
interventions. The UMN DIL team drafted 
a set of learning outcomes targeting the per- 
ceived greatest needs of graduate students that 
arose in the interviews. The partnering civil 
engineering faculty member vetted these out- 
comes and provided suggestions for involving 
students with the topic. Incorporating content 
from existing sources and tying instruction to 
federal requirements for data management, we 
developed a seven-module online course over 
three semesters. 

The UMN librarians applied their expertise 
in organizing and managing information to the 
curation of research data. The civil engineering 
faculty member provided a reality check to en- 
sure that the skills would speak to the students’ 
experiences and fit within disciplinary norms. 
This partnership proved mutually beneficial, 
since the faculty could address a skill gap with- 
out creating the content to fill that gap. It gave 
the librarians a new way to engage with stu- 
dents and to introduce ourselves as resources 
for managing and sharing data. 

This case study has been a starting point 
in the conversation of disciplinary norms. A 
replication or adaptation of this process ad- 
ministered more widely would gauge the DIL 
needs of students across institutions in the civil 


engineering field. Once the educational gaps 
have been identified, the ASCE’s BOK should 
be updated to address these skills. 

Because the course lives online in a modular 
package, we were able to repurpose the peda- 
gogy and teach the course in a way that bet- 
ter met student needs. Moreover, students can 
revisit the course material online and continue 
to develop their DMP through the openly ac- 
cessible materials. 

The course provides a framework for other 
librarians who hope to learn more about data 
management themselves or want to build 
learning objects for their institutions. Through 
the promotion of the DIL website, social me- 
dia presence, and presentations at conferences, 
we have been in correspondence with librarians 
interested in examining what we are offering. 

On our campus we've seen a hunger for 
guidance on these issues from both faculty and 
researchers. This is a natural extension of classic 
library services, including information classifi- 
cation and organization as well as information 
literacy instruction. DIL is a key component in 
the librarian’s role on campus. 


NOTES 


Portions of this case study are reprinted with 
permission from Johnston, L., & Jeffryes, J. 
(2013, February 13). Data management skills 
needed by structural engineering students: 
A case study at the University of Minnesota. 
Journal of Professional Issues in Engineering Edu- 
cation and Practice. http://dx.doi.org/10.1061 
/(ASCE)EI.1943-5541.0000154; and Jeffryes, 
J., & Johnston, L. (2013). An e-learning ap- 
proach to data information literacy education. 
Paper presented at the 2013 ASEE Annual 
Conference, Atlanta, GA. Available at http:// 
purl.umn.edu/156951. 


Teaching Civil Engineering DIL Skills CHAPTER7 167 


This case study is available online at http:// 
dx.doi.org.10.5703/1288284315479. 
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APPENDIX A: Data Information Literacy Skills Needed 
by Graduate Students in Civil Engineering 


We identified the following skills as important educational needs for graduate students participat- 


ing in structural engineering advanced degree programs. 


Data Stage 


1. Raw data 
generation 


2. Collection and 
organization 


3. Processing and 
analysis 


DIL Skills Needed by Civil Engineering Graduate Students 


All students 

Understand how sensors work and respond to physical phenomenon 

Download sensor log files securely 

Understand privacy issues associated with data (i.e., real-time bridge sensor data) 

Track external events and understand how these affect raw data generation (e.g., 
temperature, weather, maintenance) 

Determine the best way to manage data collected over time (i.e., DMP) 


Some students 

Write a work plan/experimental design 

Create/read schematic representation of sensor locations on physical bridge 
Work with experimental laboratory personnel 

Troubleshoot issues with sensor hardware attached to structures 

Find/reuse existing data 


All students 

Organize data with temporal component (i.e., 15 min. increments) 
Collect/organize data from multiple sensor sources into a single file 

Track multiple versions of a data file shared with multiple people 

Create documentation about data collection 

Back up data appropriately 

Co-locate metadata and processing actions with organized data 

Create a custom file naming schema that can be easily understood by others 
E-mail documents securely/efficiently (so not to make multiple versions) 
Track versions of the data and maintain authority control 

Be aware of university security policies (using laptop for remote data collection) 
Understand how to separate out data that was affected by external events 


Some students 
Manage media files generated by MAST instrument (video, images) 


All students 

Create documentation of analysis steps for future graduate student or data reuse 
Use known engineering theories to process data (temperature, age) 

Apply equations to transform data into results (e.g., sensor frequencies into stresses) 
Compare data to simulation results 


Identify trends 


Generate graphs and plots to visualize the data 


Some students 

Understand how to find, use standards and code books 

Utilize programming tools such as MATLAB to create simulation data 
Analyze data in instrument-specific software program 


DMP, data management plan; MAST, Multi-Axial Subassemblage Testing; NSF, National Science Foundation; NEES, 
Network for Earthquake Engineering Simulation. 


Continued 
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Data Stage 


A. Results 


5. Sharing and 
archiving 


6. Preservation 


DIL Skills Needed by Civil Engineering Graduate Students — cont'd 


All students 

Report results of data analysis 

Explain trends in the data 

Create/plot graphs that accurately convey meaning of data 


Some students 
Report data to funding bodies (NSF, state contracts) 
Show results in presentation format 


All students 

Understand the potential for others to reuse data 

Acknowledge the implications of accessing/sharing data using proprietary software 
Understand the scientific value of sharing data 

Archive data in the lab 


Some students 
Deposit data into discipline repository (NEES) 
Archive locally on external hard drive 


All students 

Use file formats that allow long-term access 

Create preservation backup copies 

Understand the funder requirements for maintaining access to data 
Understand the issues/problems associated with the preservation of data 
Co-locate the data and documentation 


DMP, data management plan; MAST, Multi-Axial Subassemblage Testing; NSF, National Science Foundation; NEES, 
Network for Earthquake Engineering Simulation. 
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APPENDIX B: Syllabus of E-Learning Online Course 


Module 1: Introduction to Data Management 


In this section we will introduce the concept of data management using an example from an aca- 
demic researcher. 
After completing this module, students will: 


e Describe the benefits of data management to explicitly understand the benefits of partici- 
pating in the course 
e Articulate what they will get out of this program to reinforce the learning outcomes of the 


curriculum 


Assignment: Write one paragraph in Section 1, “Introduction,” of your DMP describing why 
data management is important for this project. 


Module 2: Data to Be Managed 


This module will help students define what information they will be managing, document the data 
collection process, and plan to store, back up, and securely house these data. After completing this 
module, students will: 


e Create a data inventory for their research project (data, project files, documentation, and 
so forth) to not overlook any aspects of their DMP 
e Write a backup and storage plan to avoid potential loss of data 


Assignment: In your DMP, complete Section 2, “Data Types,” by describing what data you will 
manage and including details on how you will store and back up these files. 


Module 3: Organization and Documentation Methods 


This module will help students plan for how they will organize their data, track versions, create 
metadata, and prepare their documentation for sharing. After completing this module, students will: 


e Plan an organizational structure for their data using a file naming system and directory 
structure that is well documented and interoperable with other data sets to decrease ver- 
sioning issues and data duplication 

e Articulate a plan to collect and share the supplementary data points of their research to 
assist other researchers in making sense of their data 

e Fill out a metadata schema example for their data to model ideal metadata practices 
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Assignment: Complete Section 3, “Data Documentation, Organization, and Metadata,” of your 
DMP by describing what standards and documentation will be used in your project or lab. (Op- 
tional: Embed video or images to describe your process. Fill in the metadata template to de- 
scribe your data set.) 


Module 4: Data Access and Ownership 


This section will illustrate some of the intellectual property and access concerns that researchers 
face when sharing their data with others. After completing this module, students will: 


e Name the stakeholders of their data to understand the potential intellectual property and 
ownership concerns with releasing their data to a broader audience 

e Report potential access concerns with their data to plan for the appropriate access controls 

e Identify potential access controls to secure their data prior to release 


Assignment: In your DMP, complete Section 4, “Data Access and Ownership,” describing any 
access and ownership considerations your data may have. 


Module 5: Data Sharing and Reuse 


This section will describe the benefits of data sharing and potential for reuse as well as introduce 
students to the concept of data publishing and citation. After completing this module, students will: 


e Name the audience for whom the data will be shared to customize the documentation and 
format for potential reuse 

e Explain an approach they will use to share the data to instill best practices for their future 
data sharing 

e Cite their data in a properly structured format in accordance with emerging standards to 
prepare them to ethically reuse data in the future 


Assignment: In your DMP, complete Section 4, “Data Sharing and Reuse,” describing how your 
data will be shared for reuse. Update your DMP if data will be deposited in a data repository and 
include a preferred citation for your data set. 


Module 6: Preservation Techniques 
This module will introduce the preservation and curation techniques used by information pro- 
fessionals who manage digital information for long-term access. After completing this module, 


students will: 


e Explain the life span of potential use for their data to recognize the long-term value of 
their data 
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e Identify the relevant preservation-friendly file format for their research data to ensure long- 
term access to their digital information 


Assignment: In your DMP, complete Section 6, “Data Preservation and Archiving,” describing 
how your data will be preserved for long-term access. 


Module 7: Complete Your Data Management Plan 


This final module will instruct the students on how to complete and implement their DMP within 
their lab, research group, or future project. After completing this module, students will: 


e Map out an implementation plan to prepare them to immediately apply the information 
presented in the previous modules 
e Identify the components of a DMP to repeat the process with future research activities 


Assignment: Compile the final version of your DMP. Write a one-paragraph implementation 
plan describing how the DMP will be used in your future research. Submit the final DMP (Word 
doc or PDF) to the instructors for review and feedback. 
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APPENDIX C: Data Management Plan (DMP) Google Docs Template 


DATA MANAGEMENT PLAN 


V1 last updated MM-DD-YYYY 


Rina idiot 
Name of group/project Project name or research lab (for group plan) 


Funding body(ies) 


Partner organizations Oo ëE 


Project duration Start: MM-DD-YYYY End: MM-DD-YYYY 
Date written MM-DD-YYYY 


Table of Contents 


Introduction 

Data Types 

Data Organization, Documentation, and Metadata 
Data Access and Intellectual Property 

Data Sharing and Reuse 


ON NIPE 


Data Preservation and Archiving 

1. Introduction 

The research project described in this data management plan (DMP)... 
2. Data Types 


The types of data generated and/or used in this project include... 
Section 2 Checklist 


What type of data will be produced? 
How will data be collected? In what formats? 
How to document data collection? 
Will it be reproducible? What would happen if it got lost or became unusable later? 
e How much data will it be, and at what growth rate? How often will it change? 
Are there tools or software needed to create/process/visualize the data? 
e Will you use preexisting data? From where? 
Storage and backup strategy? 
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3. Data Organization, Documentation, and Metadata 


The plan for organizing, documenting, and using descriptive metadata to assure quality control and 
reproducibility of these data includes . . . 


Section 3 Checklist 


What standards will be used for documentation and metadata? 
e Is there good project and data documentation format/standard? 
e What directory and file naming convention will be used? 

What project and data identifiers will be assigned? 

Is there a community standard for metadata sharing/integration? 


4. Data Access and Intellectual Property 
The data have the following access and ownership concerns. . . 


Section 4 Checklist 


What steps will be taken to protect privacy, security, confidentiality, intellectual property, or 
other rights? 

Does your data have any access concerns? Describe the process someone would take to access 
your data. 

Who controls it (e.g., principal investigator, student, lab, university, funder)? 

Any special privacy or security requirements (e.g., personal data, high-security data)? 
e Any embargo periods to uphold? 


5. Data Sharing and Reuse 


The data will be released for sharing in the following way . . . 
Section 5 Checklist 


If you allow others to reuse your data, how will the data be discovered and shared? 
Any sharing requirements (e.g., funder data sharing policy)? 

Audience for reuse? Who will use it now? Who will use it later? 

When will I publish it and where? 

Tools/software needed to work with data? 
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6. Data Preservation and Archiving 


The data will be preserved and archived in the following ways . . . 
Section 6 Checklist 


How will the data be archived for preservation and long-term access? 

How long should it be retained (e.g., 3—5 years, 10-20 years, permanently)? 

What file formats? Are they long-lived? 

Are there data archives that my data is appropriate for (subject-based? Or institutional)? 
Who will maintain my data for the long-term? 
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APPENDIX D: Assessment Form 1: Follow-Up Satisfaction Survey 


Data Management Course Evaluation 


Thank you for completing the Data Management Course (http://z.umn.edu/datamgmt). Your 


feedback will help us to improve this course. 


1. 


Course content was delivered in a clear manner. 

a. Yes, I strongly agree 

b. Yes, I agree 

c. Neutral, unsure 

d. No, I disagree 

e. No, I strongly disagree 

Course content was appropriate for my research area/focus. 
Yes, I strongly agree 


> 


Yes, I agree 
Neutral, unsure 


oo TF 


No, I disagree 

e. No, I strongly disagree 

What did you find most useful about the course? 

How might we improve the course? 

Please provide any additional comments or suggestions. 
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APPENDIX E: Assessment Form 2—6-Month Follow-Up Survey 


Data Management Course: Follow-Up 


We're interested to learn if participation in the fall 2012 Data Management Course (z.umn.edu 
/datamgmt) impacted your data management behavior in the months following your participation. 
*Required Question 
1. How useful was the storage and back-up plan portion of your data management plan?* 
a. Very useful: I employed the plan in storing my data 
b. Useful: I employed aspects of the plan in storing my data 
c. Not useful: I did not employ this portion of the data management plan in storing my 
data 
Comments: 
2. Which of the following describes your experience with organizing and documenting your 
data?* Circle all that apply. 
I created and employed a file naming structure that is clear and easy to understand 
I created and employed a file naming structure that only I can understand 
I did not use structured file naming 
I employed a metadata schema for my data and applied it consistently during my research 
I employed a metadata schema for my data and occasionally applied it during my research 


mo oa.0 fF P 


I did not use a metadata schema 
Comments: 
3. How useful was thinking about data ownership and access to your data?* 
a. Very useful: This topic came up in my research and it was good to have anticipated the 
concerns in my plan 
b. Useful: It was worthwhile to consider this in my plan, but the issue never came up 
c. Not useful: This topic never came up during my research 
Comments: 
4. How useful did you find planning for data sharing and reuse?* 
a. Very useful: I've made my data available for reuse 
b. Useful: I'm glad to have a plan for sharing, if the request arises 
c. Not useful: I don' t plan to share my data 
Comments: 
5. How useful was planning for data preservation and archiving for your data?* 
a. Very useful: I'm archiving my data so that the files will be preserved for future use 
b. Useful: I'm glad to know about data preservation techniques, should I choose to ar- 
chiving my data 
c. Not useful: I don't plan to archive my data 
6. Anything else you would like to tell us about implementing your data management plan? 
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INTRODUCTION 


At the University of Oregon, our Data In- 
formation Literacy (DIL) team worked with 
a vegetation ecology research group that was 
in the final year of a 4-year grant-funded 
project. The purpose of the project was to 
study climate change impacts on Pacific 
Northwest prairie ecosystems. The librarian 
team consisted of the science data services 
librarian and the subject specialist for biol- 
ogy, environmental science, and geology. We 
partnered with a professor in the Depart- 
ment of Landscape Architecture within the 
School of Architecture and Allied Arts and a 
co-principal investigator (co-PI) on a climate 
change impacts (CCI) study. All other mem- 
bers of the team, including the lead investi- 
gator for the Department of Energy grant, 
were in the Institute of Ecology and Evolu- 
tion within the Department of Biology. The 
CCI research group composition changed as 
students completed projects, but at the out- 
set of our work, it consisted of two faculty, 
two postdoctoral research associates, three 
graduate students, and one research assistant 
who had completed an undergraduate degree 
in ecology. 

The CCI team investigated the impacts of 
increased temperature and precipitation on 
vegetation ecology in prairie ecosystems. The 
research used three localities, each with plots 
where temperature and precipitation were ar- 
tificially increased above ambient levels, and 
un-manipulated control plots for comparison. 
Team members researched a variety of factors, 
such as growth and reproduction of specific 
plant populations, transpiration rates, and 
soil characteristics, with individual projects 
within this larger context. 


LITERATURE AND ENVIRONMENTAL 
SCAN OF ECOLOGICAL DATA 
MANAGEMENT BEST PRACTICES 


To better understand the data management cul- 
ture of practice within ecology, as well as cur- 
rent theory and guidance, we examined the lit- 
erature on research data management (RDM) 
practices in biology, ecology, and aligned envi- 
ronmental fields, additional generic best prac- 
tices, and resources. 

The literature revealed a robust set of arti- 
cles on RDM in established ecological and sci- 
ence journals. The ecology and environmental 
sciences publications were useful not only be- 
cause of their applicability to the team’s needs, 
but also because sharing such resources from 
journals in their research domain might lend 
greater credibility to instructional efforts with 
the team. Data management, sharing prac- 
tices, and related topics have been presented 
in articles, reviews, and columns in journals 
such as the Bulletin of the Ecological Society of 
America (Borer, Seabloom, Jones, & Schild- 
hauer, 2009; Fegraus, Andelman, Jones, & 
Schildhauer, 2005), Trends in Ecology & Evo- 
lution (Madin, Bowers, Schildhauer, & Jones, 
2008; Michener & Jones, 2012), PloS ONE 
(Tenopir et al., 2011; Wieczorek et al., 2012), 
Global Change Biology (Wolkovich, Regetz, & 
O’Connor, 2012), and Ecological Informatics 
(Enke et al., 2012; Madin et al., 2007; Mi- 
chener, 2006; Michener, Porter, Servilla, & 
Vanderbilt, 2011; Veen, van Reenen, Sluiter, 
van Loon, & Bouten, 2012). 

These articles make the case for good 
data management practices and outline spe- 
cific steps that researchers can take to curate 
their data. One of the most informative and 
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practical articles was Borer et al. (2009), which 
we shared with the team as a pre-instruction 
session reading. The authors provided a list 
of basic data management steps that could be 
taken with ecology data, such as 


e using scripts to record statistical analyses; 

e storing and sharing data in nonpropri- 
etary formats; 

e archiving original raw data; 

e using descriptive file naming; 

e creating optimal spreadsheet structure 
and database schema; 

e recording full taxonomic names; 

e standardizing date and time formats; 

e recording metadata early and frequently. 


More recent articles take a similar approach, 
such as advocating for the publication of bio- 
diversity data (Costello, Michener, Gahegan, 
Zhang, & Bourne, 2013), and highlighting 
steps that will make it easier for others to re- 
use the data one might publish (White et al., 
2013). 

Data practices in research teams are often 
not standardized (Borgman, Wallis, & Enyedy, 
2007) and vary from one person to another 
even within research teams under a common 
faculty member (Akmon, Zimmerman, Dan- 
iels, & Hedstrom, 2011). 

Science and engineering faculty interviewed 
at Purdue University and the University of Il- 
linois at Urbana-Champaign wanted graduate 
students to better understand and implement 
good metadata practices (Carlson, Fosmire, 
Miller, & Sapp Nelson, 2011). Metadata 
standards and usage have been discussed in a 
number of articles aligned with the CCI team’s 
ecology focus (Fegraus et al., 2005; Jones, 


Schildhauer, Reichman, & Bowers, 2006; 


Kunze et al., 2011; Madin et al., 2007, 2008; 
Michener, 2006; Michener, Brunt, Helly, 
Kirchner, & Stafford, 1997). 

However, some scientists have been reluc- 
tant to provide metadata due to the time it 
would take to create and record it, concerns 
about misuse of data, and loss of intellectual 
property rights (Schmidt-Kloiber et al., 2012). 
Concerns about data ownership may have 
more to do with “scientific revenue” (Janfen et 
al., 2011) than intellectual property that would 
generate income, particularly since these are 
fields with less potential for monetization of re- 
search discoveries through technology transfer. 
Some posit that a consensus-driven agreement 
on data ownership is needed to further scien- 
tific collaboration and avoid conflict (Fraser 
et al., 2013). In an attempt to facilitate con- 
tinuing individual control over data sharing, 
some proposed an “account-based approach to 
data property rights management” (Janfen et 
al., 2011, p. 617). A study of the Center for 
Embedded Networked Sensing (CENS) noted 
that data sharing transactions can resemble 
bartering for goods transactions with other 
trusted colleagues (Wallis, Rolando, & Borg- 
man, 2013). 

There are, however, a growing number 
of influential proponents for open access to 
research data (Dryad, 2014; National Evo- 
lutionary Synthesis Center, n.d.). Funding 
agency requirements to share research data 
(Holdren, 2013) will likely accelerate the 
transition to practices and services in support 
of open data. Dryad provides a leading exam- 
ple of a data repository, with Creative Com- 
mons Zero (CC0) licensing for all submitted 
data. This is integrated with the publication 
review process for a growing number of ecol- 
ogy journals (Dryad, 2014). 
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INTERVIEWS AND RESULTS 


We conducted interviews with several members 
of the CCI team using the DIL interview pro- 
tocol (available for download at http://dx.doi 
.org/10.5703/1288284315510). Our 
views were with the collaborating professor, a 


inter- 


postdoctoral fellow, the research assistant, and 
two graduate students (one completing a mas- 
ters degree, the other working on a doctorate). 

Participants in the interviews provided 
descriptions of the data life cycles of their re- 
search, though data sharing processes and proj- 
ect close-out practices were less clear because 
they did not yet have experience in those areas. 

The team primarily collected and created 
tabular data, such as manually recorded field 
observation data that were later transcribed 
into spreadsheets, and data downloaded from 
field devices and sensors. At least one gradu- 
ate student was conducting laboratory analyses 
of soil samples, but those tests did not com- 
mence until a few months later. They compiled 
tabular data using Excel and usually imported 
them into statistical programs for analysis (typ- 
ically SPSS, though PC-ORD and R were also 
noted). They graphed results for review, analy- 
sis, and presentation or publication using pro- 
grams such as SigmaPlot and GIMP. 

Interviewees were aware of the types (in- 
cluding format) and numbers of data files 
(computer files or data sheets) collected and 
created in their work at almost all stages of the 
data life cycle. Interviewees were less aware of 
the typical size of any given data file, but were 
also confident that the size and numbers were 
small compared to the storage space available 
on a typical laptop computer. 

Interviewees were generally comfortable 
using their data collection and analysis tools, 
though some were in the process of learning 
tools such as SigmaPlot. The type of statistical 


analysis tools varied based on personal prefer- 
ence and previous experience. Data conver- 
sions were typically between Excel and .csv file 
formats. In limited instances, there were re- 
projections of spatial data sets. 

Most group members were familiar with the 
concept of metadata, if not the actual term. 
‘The types of annotations and other descriptive 
information associated with data collection 
varied slightly between individuals for their 
own unique project data. However, all individ- 
uals who collected data in the field used data 
sheets and field notebooks to annotate data 
collection issues. They backed up field notes 
by transcribing them from the field notebook 
to a lab book that did not leave the lab. The 
degree of detail in these records varied based 
on descriptions by the interviewees. Team 
members held differing views on how readily 
another person could reproduce their research 
or reuse the data if relying solely on the note- 
books and metadata. 

There was a lack of consistency across the 
team in file management practices, from file 
naming and version control, to storage and 
backup. All interviewees assumed that they 
would leave a copy of their data with the fac- 
ulty, but interestingly, faculty and students both 
assumed that lab notebooks were the property 
of the students. Interviewees expressed interest 
in establishing protocols for handing off work 
product to the PIs as they completed their re- 
spective research projects. Interview responses 
indicated that the participants were motivated 
to improve their practices, even as the grant ap- 
proached its closeout date. 

The team members used multiple storage 
locations, including external hard drives, per- 
sonal laptops, home computers, and a shared 
computer in the team’s research offices. All team 
members backed up their data; however, backup 
intervals differed from person to person. 
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Figure 8.1 Data information literacy competencies as rated by the University of Oregon 
faculty and graduate students. Ratings based on a 5-point Likert scale: 5 = essential; 4 = very 


important; 3 = important; 2 = somewhat important; 1 = not important. 


Because few, if any, had used external data 
for their own research, and none had pub- 
lished data, their knowledge of practices and 
resources in these areas was limited. However, 
all expressed a willingness to share their data 
and felt that their data could provide a base- 
line for other studies on the effect of climate 
change on plant ecosystems. For this reason 
they believed that their data would be impor- 
tant for many years. Restrictions that they 
might impose on data sharing were primar- 
ily related to proper acknowledgment of the 
source. They were aware that some journals 
required the submission of associated data 
sets with a manuscript, but they did not know 
how the data would be annotated, preserved, 
or shared. Most interviewees reported that 
they had not received training in dealing with 
intellectual property and data ethics issues 
and had a limited understanding of privacy, 
confidentiality issues, and the university’s pol- 
icies on research. 


Educational Needs and Priorities 


‘The faculty member who participated in the in- 
terview indicated that all 12 of the data literacy 
competences were important to the research 
project. He felt that skills in each of the compe- 
tencies were needed to do proper research and 
that both he and the students would benefit 
from training in these areas (see Figure 8.1). 
‘The rest of the team agreed, at least concep- 
tually, about the importance of these data skills. 
However, in comparison to the professor, the 
other team members were not as familiar with 
each of the concepts. Their ratings of the impor- 
tance of the competencies ranged from “impor- 
tant” to “essential,” with the exception of one 
“I dont know” because of unfamiliarity with 
metadata concepts. The team reported that self- 
teaching (or trial and error), peer-to-peer, and 
student-to-mentor (whether faculty or postdoc) 
consultations were the common practice for ad- 
dressing RDM questions as they arose. 
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A DISCUSSION-BASED APPROACH 
TO TEACHING DATA INFORMATION 
LITERACY SKILLS 


We scheduled our instruction for the group to 
be completed during the fall quarter of 2012, 
which was also the final quarter of their 4-year 
grant. Seasonal and weather-dependent field 
data collection events could not be delayed; 
the potential data to be collected would be irre- 
producible. With these pressures on the faculty 
and the rest of the research team, it was reason- 
able to expect that our access to the team for 
instruction would be limited. 

We negotiated with the two faculty mem- 
bers to schedule a 1.5-hour session in place of 
a regular team meeting in October. The ses- 
sion incorporated lecture, group exercises, and 
discussion. Providing training for a small team 
of research scientists enabled us to design and 
present the instruction in an informal, conver- 
sational setting. 

After reviewing the interviews and the re- 
sults of our literature review, we developed a 
data management training session on the fol- 
lowing: 


e Metadata as it relates to documenting, 
sharing, finding, and understanding data 

e File naming 

e Data structure and recording methods 

e Data repositories and shared data 

e Commonly accepted lab notebook policies 

e Data ownership and preservation 


We believed it would be unrealistic to ex- 
pect the team to implement many new prac- 
tices with only a few months left in the project. 
However, these topics and resources might be 
applied when handing off data to the faculty 


and when publishing research results, and the 


skills would applicable to future projects. The 
topics and respective learning outcomes that 
we generated for our DIL program are dis- 
played in Table 8.1. 

To develop a foundational link to cultures 
of practice, we provided two assigned readings 
from the research domain prior to the instruc- 
tion session and then integrated them into the 
discussions. A third reading was included to 
highlight typical policies and best practices for 
research notebooks. The readings were 


e “Some Simple Guidelines for Effective 
Data Management” from the Bulletin of 
the Ecological Society of America (Borer et 
al., 2009); 

e a Global Change Biology article on the 
need for open science and good data man- 
agement for advancing global change re- 
search (Wolkovich, Regetz, & O’Connor, 
2012); 

e an online chapter on lab notebook poli- 
cies and practices (Thomson, n.d.). 


The research team had some turnover be- 
tween our interviews and the instruction ses- 
sion. Six people attended the training: two fac- 
ulty, two postdocs, and two graduate students. 
Only two of this group had participated in the 
interviews: our faculty partner and one gradu- 
ate student. 


Instructional Components 


We created a session outline which included 
links to examples presented in the class, addi- 
tional resources, and references (see Appendix 
A to this chapter). 

We anticipated that the readings we as- 
signed before the team meeting would pro- 
vide shared understanding and starting points 
for some of the discussion. The instruction 
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TABLE 8.1 


Learning Outcomes for the University of Oregon Training Session 


Topics Learning Outcomes 


File formats and 
conversions 


ls aware of and accounts for interoperability issues throughout the data life cycle: 
considers impacts that proprietary file formats, identifiers, and data access can have 
on linked data/Semantic Web, and so forth 

Knows how and why to convert files from one format to another and does so 
consistently 


Publishing data Knows where to find relevant data repositories and how to evaluate and select where 
to deposit data, and where to get data 


Publishing data with Nature, other journals, Dryad? 


Knows what data preservation is, why it is important, and what it costs; employs some 
evaluative criteria in choosing what to preserve and for how long 

Records metadata in the repository so others can find, understand, use, and properly 
cite the data set 

Knows how to properly package and hand off the data to the PI at the close of his or 


Preservation and 
archiving 


her participation in a project 


Data citation 


Correctly cites data from external sources 


Knows what a unique identifier is, and its utility for data citation 

Knows how to publish/share data/identifiers 

Understands usage permissions issues, and permissions management tools and 
restrictions such as creative commons, copyright, and data commons 


session was a combination of lecture with 
slides, online resources, hands-on activities, 
and discussion. Some of the presentation 
slides were taken from education modules by 
the DataONE project. 

The instruction session began with why data 
management is important, the risks of poor 
data practices, and the value of sharing data to 
the researcher, scientific community, sponsor, 
and the public. 

To direct a discussion of the chapter about 
lab notebook policies and practices, we asked: 
(1) What policies or guidelines were new 
to you? and (2) Is there anything you might 
change or do differently in light of the guide- 
lines? Here the discussion turned to concerns 
about the applicability of the notebook prac- 
tices and policy materials to field research 
note taking. We highlighted roles and respon- 
sibilities for data and notebook stewardship, 


indicating that these typically are not the prop- 
erty of graduate students, but remain with the 
PI as a representative of the institution when 
projects are completed. 

Next we looked at file management, review- 
ing common file naming conventions outlined 
on the University of Oregon data management 
website, followed by data backup consider- 
ations and file conversions and transforma- 
tions. We discussed data structures and used a 
short exercise to test whether they could iden- 
tify errors in a spreadsheet. This exercise was 
based on materials from the DataONE project. 

Several members of the group reported in 
the interviews that they did not use relational 
databases for data and were not confident with 
these concepts. To demonstrate some basic 
structures of relational databases, we created a 
hands-on exercise using “flat files” (which were 
titled sheets of paper) that could be organized 
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into relationships of one-to-one, one-to-many, 
and many-to-one. The participants arranged 
the files in a manner that represented data simi- 
lar to what they might collect and that showed 
the relationships of the files. 

We reviewed Dryad and DataONE Mercury 
as two examples of ecological data repositories. 
Navigating to and examining data sets in these 
two resources provided a concrete introduction 
to data repositories, metadata standards, data 
set registration, unique identifiers and DOIs, 
and linking between data and publications. 
The data sets provided a foundation for a dis- 
cussion about publishing data and access and 
use permissions. 

Finally we highlighted the most commonly 
noted parts of a data citation from the litera- 
ture, and then opened the rest of the session to 
questions and discussion about topics of inter- 
est to the team. 


Assessment 


We based our assessment of the DIL program 
on discussions in the training session, infor- 
mation gathered in two post-training surveys, 
and conversations and e-mail correspondence 
with the faculty and other team members. (The 
training feedback survey questions are in Ap- 
pendix B to this chapter.) We collected the ini- 
tial feedback via a Google form linked from the 
instructional materials. Five of the six attendees 
filled out the form, while two responded to a 
more detailed Qualtrics survey that we distrib- 
uted later. The two faculty were also asked for 
more information several months later. This 
section summarizes the collected comments 
and suggestions and our own observations. 
The results of our assessment indicated 
that we had raised awareness of data manage- 
ment issues and positively impacted the team. 
Some team members reported that the initial 


interviews prompted them to think more deeply 
about how they managed their research data. 
One researcher reported that since the instruc- 
tional session the team became more cognizant 
of data management issues and began to embrace 
new practices. In particular, the team was more 
conscientious about providing detailed descrip- 
tive information (metadata) in notebooks and 
electronic records, and the lead faculty member 
for the project requested that data sets be shared 
with him in non-proprietary formats to ensure 
long-term access. Team members reported pay- 
ing closer attention to data storage, preservation, 
and sharing issues. More specifically, team mem- 
bers said they planned to 


e “doa better job of planning for data man- 
agement at the onset of a project”; 

e “explore my options for online backups 
of my data”; 

e “save long-term data in a .csv format and 
provide metadata for that file.” 


One of the faculty reported that the train- 
ing had “brought me up to date with growing 
expectations for sharing of data... gave me 
deeper impetus to apply sound meta practices 
so that future users could understand how and 
why data was developed and processed the 
way it was.” The sessions “changed the degree 
to which we systematically apply protocols for 
data management across all aspects of the proj- 
ect. They also gave us useful insight into the 
resources available for data curation.” 

The team valued guidance that was either 
very closely aligned with the team’s data acqui- 
sition practices or easily translated into their 
workflow and publication processes. Several 
respondents said they appreciated the open 
discussion on specific needs and questions that 
occurred at the end of the session. Several said 
they would have rather spent more time in 
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interactive work with an immediate applica- 
tion to their current research and data manage- 
ment tasks, and less time on overview and basic 
instruction. 

The article by Borer and colleagues (2009) 
that provided data management guidelines was 
particularly well received and provided a use- 
ful introduction to a number of practices that 
were at the heart of the session. The article by 
Wolkovich, Regetz, and O’Connor (2012) 
was not mentioned as often in the assessment, 
but it provided a strong case for data sharing 
in the multidisciplinary field of global change 
research, the very topic of the CCI project. 
Though not its primary focus, the article in- 
cluded a useful table listing some of the actions 
and skills needed for data and code sharing, as 
well as supporting website links. We included 
the chapter by Thompson on lab notebooks 
in our DIL Program as it had been used by a 
faculty member in the Department of Human 
Physiology to introduce good notebook prac- 
tices to new graduate students. However, the 
chapter elicited several surprisingly strong neg- 
ative comments from other participants. One 
of the faculty and at least one postdoc in the 
CCI group believed it had no application to 
their research workflow. Admittedly, the guide- 
lines were established for a research laboratory 
setting more typical of biochemistry than ecol- 
ogy, but we had believed readers could inter- 
pret and apply the recordkeeping guidelines to 
other forms of research documentation. 


DISCUSSION 


One of the strengths of the DIL model is that 
the structured interviews provide librarians 
with a detailed understanding of the RDM 
practices, skills, and priorities of a particu- 
lar person or team. That information and the 


literature translate to targeted instructional 
interventions. Training can be tailored to the 
specific needs of the research group, though the 
amount of content will be determined by the 
length and number of sessions that can be ac- 
commodated by the research team’s schedules 
and faculty prerogatives. 

‘The interview process can open new lines of 
communication and opportunities to provide 
RDM services to research faculty, graduate stu- 
dents, postdocs, and research assistants. The 
interviews and associated conversations raise 
awareness of library services for research scien- 
tists. For the librarians, these experiences can 
provide insight into the needs of graduate stu- 
dents, and enable librarians to expand their un- 
derstanding of the research domains they serve. 

The instruction session included concep- 
tual information for the competencies and ex- 
amples of applied RDM principles. The CCI 
group clearly favored 


context-based applied Faculty buy-in is 


learning and applica- critical and should 
tion exercises for their be kept in mind 
instruction. We incor- when selecting faculty 
porated some lecture partners and research 


and slides to provide teams for the significant 


context for some of the investment that the 
DIL competencies. In DIL model requires. 
retrospect, the Borer 

article was well received and might have suf- 
ficed since it grounded the topics in an ecology 
research ethos. The lecture was not as produc- 
tive nor well received in this small group set- 
ting. In the future we plan to put much more 
emphasis on localized use cases, applied prac- 
tices, and open discussion. 

Developing specific and relevant DIL pro- 
grams can be time consuming, but it will result 
in a more engaged group that can adopt new 
skills toward implementation of better RDM 


practices. To be effective DIL programs have to 
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respond to the needs of researchers within the 
environment they inhabit. Researchers are un- 
der pressure, particularly when time-sensitive 
field work is on the line. They also want more 
efficient workflows so they can increase their 
productivity. This is reflected in a desire to 
have more immediate application outcomes, 
through both streamlined and timely instruc- 
tion and demonstrable improvements in RDM 
practices. Librarians can gain support for train- 
ing by connecting learning outcomes to po- 
tetially lower risk of data loss, higher research 
impact, more collaborations, more competitive 
funding proposals, and more efficient data or- 
ganization and search and discovery. 

There are several considerations in apply- 
ing the DIL model to smaller research teams. 
Even with small groups consisting of Pls, re- 
search associates and postdocs, and graduate 
students, there may be a high degree of vari- 
ability in skills across the team, and individu- 
als may be engaged in highly differentiated 
projects of their own with unique workflows 
and data management concerns. This will need 

to be addressed in plan- 


oe DI popari ning the instruction, 

I and probably acknowl- 
ultimately highlight deed h f 
skills that should be CaBEC AE Tae OULEL O 


any training. Highly 
stratified skill sets might 


integrated into the 
curriculum for all 


STEM ideit be accommodated by 


distributing this exper- 
tise across groups if the 
team is large enough. In our case the climate 
change project provided a unifying theme and 
data sources, and there was some uniformity 
due to shared project management and logis- 
tics, as well as common research methods and 
workflows across the group. 
Should we work with another group that 
relies on field data collection, we will focus 
instruction on field notes and documentation 


methods, and fill in any gaps about policy ap- 
plication, rather than providing laboratory 
notebook guidance. Clearly several members of 
the team were looking for materials specific to 
the form and content of documentation they 
were using in the field. 

In most of the data librarians discussions 
with researchers about RDM, faculty typically 
preferred that we speak directly with the gradu- 
ate students and postdocs who were conducting 
research. Faculty were reluctant to unilaterally 
impose RDM practices on the team. However, 
faculty buy-in is critical, and a professor can 
exert a lot of influence on the DIL process, 
whether through the degree of librarian access 
to the students, or via the values and attitudes 
they impart to the team regarding data sharing 
and funding agency requirements. This should 
be kept in mind as librarians select faculty 
partners and research teams for the significant 
investment that the DIL model requires. Simi- 
larly, creating and nurturing a good working 
relationship with the team is important and 
can lead to other collaborations and support 
opportunities after the initial instruction has 
been provided. 

There are other considerations to be made in 
selecting groups to participate in implement- 
ing the DIL model. The academic calendar and 
grant cycle must be considered when thinking 
about optimal timing for scheduling interviews 
and instruction events. These factors may un- 
duly compress the window of opportunity for 
interactions with the students. The number of 
master’s students and PhD candidates who are 
on the team and at what stage they are in their 
program may influence the type and timing of 
instruction you can implement. 

The educational experiences of the team 
members may sometimes lead to unfore- 
seen ideas. We were working with a relatively 
small research group and chose to expand our 
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ADVANTAGES 


e Deeper understanding of specialized RDM 
practices 
More communication with faculty, grad stu- 
dents, postdocs, research assistants 

e New opportunities to provide RDM services 


LESSONS LEARNED 


Use contextual applications 

Streamline instruction 

Provide instruction at point-of-need 

Consider highly variable skill levels 

Work with faculty who have RDM “buy-in” 
Create and nurture good relationships with 
research team 

Consider academic calendar and grant project 
timing in scheduling interviews and instruc- 
tion 


A 1.5-hour training session is an effective ve- 
hicle for developing DIL competencies 


investigation of the team’s practices by includ- 
ing a postdoc and a research assistant in the in- 
terviews. The research assistant, who had not 
yet started a graduate program, received what 
we considered to be excellent training in re- 
cording metadata as an undergraduate student. 
She had worked at a field station previously, 
where students are required to document field 
work with metadata and pass reviews of their 
field notes before they could begin their own 
projects. Data sets from the students’ field proj- 
ects were deposited for public access. This type 
of experiential learning, integrated directly 
with and reinforced by reviews of ongoing re- 
search practice, is a model that we plan to ex- 
plore further. 

The DIL project may ultimately highlight 
skills that should be integrated into the cur- 
riculum for all STEM students. Within the 
CCI team a few specific components of DIL 
are addressed to varying degrees. For instance, 
our faculty partner in this project remarked 
that training in information presentation and 


graphics is a required aspect of the curriculum 
for students in his department (landscape ar- 
chitecture). In contrast, typical biology stu- 
dents learned data visualization on their own 
or tangentially through exposure to graphing 
in foundational statistics courses. 


CONCLUSIONS 


The DIL model was a very useful tool in de- 
veloping DIL training for graduate students. 
The process provides a useful categorization 
of RDM skills through which research fac- 
ulty can articulate areas of concern and priori- 
ties for skill development for themselves and 
their graduate students. Structured interviews 
of the students enabled us to identify the data 
management skills and perspectives of gradu- 
ate students conducting research on vegetation 
ecology, and to prepare, present, and assess an 
instructional session with the team. 

Research teams do not always have time 
for long-term instructional interventions, par- 
ticularly when grant deadlines are looming. In 
these situations, shorter, discussion-based ses- 
sions focused on specific local DIL issues can 
yield a measurable positive impact on graduate 
student RDM skills and attitudes. 

It would be risky to assume that the needs 
and learning outcomes from this particular 
team were the same as those from other ecol- 
ogy research teams. Taken with care, however, 
the literature and lessons we learned about 
RDM practices and DIL instruction through 
working with this team provided us with a 
good foundation for working with other grad- 
uate students who conduct field research in the 
biological sciences. 

Our results also informed the model by 
showing that a 1.5-hour training session can be 
an effective way of supporting and developing 
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graduate student DIL competencies. How- 
ever, there are caveats to the method. A short 
window for instruction significantly limits the 
number of topics and degree of detail to be 
covered. Various aspects of the training may 
gain more support if they are previewed or ne- 
gotiated with the faculty partner(s). There are 
many factors that will affect uptake, but ac- 
tive, context-based learning activities and dis- 
cussions carry the potential to help graduate 
students understand these skills and integrate 
them into their research practices. 

Finally, positive and supportive interactions 
with graduate students can set the stage for fur- 
ther instructional efforts and other RDM ser- 
vices by librarians. 


NOTE 


This case study is available online at http:// 
dx.doi.org.10.5703/12882843 15480. 
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APPENDIX A: Data Information Literacy Workshop 


Readings 


Borer, E. T., Seabloom, E. W., Jones, M. B., & Schildhauer, M. (2009). Some simple guide- 
lines for effective data management. Bulletin of the Ecological Society of America, 90(2), 
205-214. http://dx.doi.org/10.1890/0012-9623-90.2.205 

Thomson, J. A. (2007). How to start—and keep—a laboratory notebook: Policy and practical 
guidelines. In A. Krattiger, R. T. Mahoney, & L. Nelson (Eds.), Jntellectual property man- 
agement in health and agricultural innovation: A handbook of best practices (Chapter 8.2). 
Oxford: MIHR. Retrieved from http://www. iphandbook.org/handbook/ch08/p02/ 

Wolkovich, E. M., Regetz, J., & O’Connor, M. I. (2012). Advances in global change research 
require open science by individual researchers. Global Change Biology, 18(7), 2102-2110. 
http://dx.doi.org/10.1111/j.1365-2486.2012.02693.x 


Why Manage Research Data? 
First, what is data management? 
1. Taking good care of data throughout the data life cycle 
2. Some basic aspects of data management: http://library.uoregon.edu/datamanagement/index 
-html 
Why is it important? 
1. Efficiency: It’s easier to collaborate, review, and share data when they are well organized 


and described 


2. Protects the investment of time, money, and intellectual effort 


aa 


Protects unique data that cannot be duplicated 
4. Improves capacity to share data 
a. Some research funders require data sharing 
b. Journals and associations increasingly require data sharing 
i. Current Ecological Society of America (ESA) editorial policy on data sharing: The 
editors and publisher of this journal expect authors to make the data underlying 
published articles available 
ii. Dryad associations/journals: http://datadryad.org/pages/jdap 
iii. Nature: http://www.nature.com/authors/policies/availability.html 
c. Benefits of data sharing! 
i. Encourages scientific enquiry and debate 
ii. Promotes innovation and potential new data uses 
iii. Leads to new collaborations between data users and data creators 
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iv. Maximizes transparency and accountability 

v. Enables scrutiny of research findings 

vi. Encourages the improvement and validation of research methods 

vii. Reduces the cost of duplicating data collection 

viii. Increases the impact and visibility of research 

ix. Promotes the research that created the data and their outcomes 

x. Can provide a direct credit to the researcher as a research output in its own right 
xi. Provides important resources for education and training 


xii. Sharing data leads to increased citation” 


Lab Notebook Guidelines‘ 


1. What policies or guidelines were new to you? 


2. Is there anything you might change or do differently in light of the guidelines? 


File Naming and Organization 


1. Things to consider: Informative names, hierarchical searching, and stage in the data life cycles 


2. Attributes of appropriate names: Year-month-day, creator, and stage of data analysis (the 


term draft may be too ambiguous), post R, PreJohnsonReview (using Camelcaps) 


Backups and Archiving’: Comparing Backups to Archives 


1. Backups 


a. 


b. 


C. 


Used to take periodic snapshots of data in case the current version is destroyed or lost 
Backups are copies of files stored for short or near long term 
Often performed on a somewhat frequent schedule 


2. Archiving 


a. 


b 
c. 
d 


Used to preserve data for historical reference or potentially during disasters 


. Archives are usually the final version, stored long term, and generally not copied over 


Often performed at the end of a project or during major milestones 


. National Science Foundation (NSF) data management plan (DMP) guidelines men- 


tion “archives” for data; they mean an open/accessible archive for sharing the data, not 
unshared storage 


3. Why back up data? 


gs mood TD 


Limit or negate loss of data, some of which may not be reproducible 
Save time, money, productivity 

Help prepare for disasters 

In case of accidental deletions 

In case of fires, natural disasters 

In case of software bugs, hardware failures 

Reproduce results of past 


h. 


i. 
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Procedures (if they were based on older files) 
Respond to data requests 


4. Other considerations 


a. 


f. 


g. 


How often should you do backups? 

i. Continually? Daily? Weekly? Monthly? 

ii. Cost versus benefit 

What kind of backups should you perform? 

i. Partial: Backing up only those files that have changed since the last backup 

ii. Full: Backing up all files 

iii. How often and what kind will depend upon what kind of data you have and how 
important it is 

What about non-digital files (such as papers)? 

i. Consider digitizing files 

Keep backups in different location than source data 

Keep the following in mind 

i. What does not need to be backed up? 

ii. How long should you keep backups? 

iii. How do you pay for the storage space? 

iv. What is the plan for when the grant ends/funding runs out? 

Check backups on a regular basis 

Meet with your IT support and set up a backup plan 


File Types, Conversions, Transformations 


1. Workflow: How are the data handled, changed, refined, and analyzed? 


a. 


Use tools that employ and record scripts 


2. Terminology 


a. 


b. 


Conversion: From one format to another, such as Excel to .csv, or .bmp to .jpg 
‘Transformation: Changing the structure of the data, from spreadsheets to a relational 
database, or statistical meaning (i.e., applying a log function) 

‘The file type, the software, the computer operating system and hardware can all influ- 
ence what data are available and what might be lost during conversion and transfor- 
mation processes 


Data Structures and Cleanup 


1. Spreadsheets versus databases 


a. 


by 


Spreadsheets are great for calculating changes in data 
Databases are better for organizing and standardizing data 
i. Easy to see the full lists of variable 

ii. Can be queried 


iii. Allow for easy detection of variations in variable names 
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iv. Allow for easy update of variable names or additions 
v. Control the data entry process to prevent wrong entries 
vi. Minimize redundant data 
vii. Minimize redundant data entry 
2. DataUp—try it out on one of your spreadsheets: http://dataup.cdlib.org/ 


Data Repositories and Records 


1. Examples of data repositories 


a. Dryad: http://datadryad.org/ 

b. LTER: https://metacat.lternet.edu/das/Iter/index.jsp 

c. DataONE: https://cn.dataone.org/onemercury/ 

d. GenBank: http://www.ncbi.nlm.nih.gov/genbank/ 

e. ‘TreeBase: http://treebase.org/treebase-web/home.html 

f. EcoTrends: http://www.ecotrends.info/ 

g. Ecological Archives: http://esapubs.org/archive/default.htm 

h. ESA Data Registry and Archive: http://data.esa.org/esa/style/skins/esa/index.jsp 
i. Knowledge Network for Biocomplexity (KNB): https://knb.ecoinformatics.org/index.jsp 
j. NCEAS: https://knb.ecoinformatics.org/knb/style/skins/nceas/ 

k. See also: http://library.uoregon.edu/datamanagement/repositories.html 


2. What does a shared data set look like? 
a. Examine the following two examples of data records 

i. Dryad: http://dx.doi.org/10.5061/dryad.d2c619hd for Stanton-Geddes, J., Tif- 
fin, P, Shaw, R. G. (2012) Role of climate and competitors in limiting fitness 
across range edges of an annual plant. Ecology, 93(7): 1604-1613. http://dx.doi 
.org/10.1890/11-1701.1 

ii. DataONE:https://cn.dataone.org/onemercury/send/xslt Text2?pid=scimeta_472 
xml &fileURL=https://cn-orc-1.dataone.org/cn/v1/resolve/scimeta_472 
xml&full_datasources=ORNL%20DAAC&full_queryString=%20(%20text%20 
:%20oregon%20)%200R%20%20(%20text%20:%20climate%20)%20 
AND%20has%20data&ds_id=#top 


Metadata 


1. Exercise: Look at the data sets and describe five things you would you want to know in 
order to use these data 

What are the data gaps? 

What processes were used for creating the data? 

Are there any fees associated with the data? 

In what scale were the data created? 

What do the values in the tables mean? 

What software do I need in order to read the data? 


mean goe 


g. 
h. 
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What projection are the data in? 
Can I give these data to someone else? 


. Is this information different than the information you would want to include if you were 


sharing your data? 


moar 


Why were the data created? 

What limitations, if any, do the data have? 

What do the data mean? 

How should the data set be cited if it is reused in a new study? 
How would you cite the data? 

Why include a unique identifier (to cite) the data? 


: Mind defined: The information about the data set that helps you and other people 


moaoge 


Discover 

Comply with permissions 
Download 

Open 
Understand/interpret 
Cite 


; Only data that can be found is useful. Metadata is what is needed to find and understand 
the data 


a. 


eno 


a. 


Who created the data? 

What is the content of the data? 
When were the data created? 
How were the data developed? 
Why were the data developed?” 


. Where might metadata be recorded? 


Internal to the file 

i. Embedded in file header (image files,’ MP3s) 

ii. Added to the file (column names, keys) 

iii. Within the file name 

External to the file 

i. Indexes 

ii. Separate metadata files 

iii. Readme.txt 

Any of these sources of information could be altered or lost if care is not taken when 
files are edited or converted 


. NSF DMP guidelines refer to “metadata standards.” What does this mean? 


a. 


An agreed-upon information structure in which the metadata is stored, often XML 
i. Facilitates computer exchange and linking, sorting, searching 
(a) EML—ecological metadata language (structure): http://knb.ecoinformatics 
.org/#tools/eml and Morpho (tool) https://knb.ecoinformatics.org/morpho 
portal.jsp 
(b) Dryad: uses Dublin Core metadata for basic information 


198 PARTII Data Information Literacy Disciplinary Case Studies 


b. A shared set of terms and definitions. Ontologies are still poorly developed, but are 


useful?! 


Concerns and Permissions 


1. 


What are some concerns about sharing data? 


2. How can metadata help address those concerns? 


a. Guide to Open Data Licensing: http://opendefinition.org/guide/data/ 
b. Creative Commons and Data: http://wiki.creativecommons.org/Data 


3. How can publishing the data itself address some of those concerns? 


a. Be first 
b. Same rigor of review and enforcement as for articles and other works 


Depositing Data/Publishing Data 


R 


Typically associated with and at the same time as a publication or dissertation, but doesn’t 

have to be 

Embargoes 

a. In some cases, an embargo can be established, such as for dissertations, for up to 
2 years 

Unique identifiers 

a. For citation and other reasons, deposited data should be associated with a unique iden- 
tifier; that is, it should be registered 

b. UO Data repository (Scholars’ Bank) and many other data repositories now use DataCite 
to register data sets and create DOIs for them 

Data deposit example 

a. Dryad: http://www.datadryad.org/pages/faq#depositing and video http://www.youtube 
.com/watch?v=RP33cl8tL28 &feature=youtu.be 

b. Ecological monographs example: http://dx.doi.org/10.1890/11-1446.1 

c. See “Data Availability” at end of full text: links to Dryad (http://dx.doi.org/10.5061 
/dryad.gd856) 


Citing Data 


1. 


What are the components of a citation?!" 

a. Responsible party (i.e., study principal investigator [PI], sample collector, government 
agency) 

Name of table, map, or data set with any applicable unique IDs 

Name of data center, repository, and/or publication 

Analysis software, if required 


goo Ft 


Date accessed 


2. 
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f. URL and/or DOI/DOI link or other persistent link 

g. See also: http://library.uoregon.edu/datamanagement/citingdata.html 

Feedback: https://docs.google.com/spreadsheet/embeddedform?formkey=>dHNxdDRXW 
mhmaGl1l cHhFWW12eGF1Vmc6MQ 
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APPENDIX B: Feedback and Assessment of 
the Data Information Literacy Session 


End-of-Session Quick Feedback Form 


Please list, in order of priority to you, four or five things from today that were new to you 
or updated what you had previously known. 

What are three things that you will do differently in managing your data, based on today’s 
session? 


Post-Session Survey 


Li 


2a. 


2b. 
3a. 


Skip this question if you did not participate in the interviews. If you participated in an inter- 

view with Dean and Brian (spring of 2012), did the interview prompt you to examine your 

data management practices, and if so, are there any changes you made before the instruc- 
tion session? 

How effective was the following article in describing why data management is important 

for your discipline? 

Wolkovich, E. M., Regetz, J., & O’Connor, M. I. (2012). Advances in global change 
research require open science by individual researchers. Global Change Biology, 18(7), 
2102-2110. http://dx.doi.org/10.1111/j.1365-2486.2012.02693.x 

1 = Ineffective | 2 | 3 = Okay | 4 | 5 = Very effective 

Comments: 

How effective was the following article in providing you with best practices that you could 

apply to data management in your current research project(s)? 

Borer, E. T., Seabloom, E. W., Jones, M. B., & Schildhauer, M. (2009). Some simple 
guidelines for effective data management. Bulletin of the Ecological Society of America, 
90(2), 205-214. http://dx.doi.org/10.1890/0012-9623-90.2.205 

1 = Ineffective | 2 | 3 = Okay | 4 | 5 = Very effective 


. Comments: 
. How useful were the exercises (i.e., spreadsheet and relational database data structures) to 


improving your understanding of and ability to work with structuring data? 
1 = Not useful at all | 2 | 3 = Okay | 4 | 5 = Very useful 


. Comments: 


Are there changes you have made or plan to make in how you manage research data as a 
result of the training session and readings? 

Please list any other criticisms or favorable comments and suggestions about the readings, 
exercises, discussion, or other aspects of the training session. 
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INTRODUCTION 


The Data Information Literacy (DIL) project 
showed that developing educational programs 
on data for graduate students is a big area of 
opportunity for librarians. However, develop- 
ing successful DIL programs can seem daunt- 
ing, and you may be wondering: How do I get 
started? Do I have the knowledge to create a 
DIL program that will have an impact on stu- 
dents? Will I have the resources and support 
that I need to be successful? The DIL project 
teams, based in libraries at Purdue University 
(two teams), Cornell University, the University 
of Minnesota, and the University of Oregon, 
learned a great deal from their experiences. 
This chapter will share what we have learned 
to help other librarians create and implement 
DIL programs of their own. The information 
and guidance presented in this chapter is based 
on the collective experiences of the five DIL 
project teams in crafting their programs for 
graduate students in several science, technol- 
ogy, engineering, and mathematics (STEM) 
disciplines. We have included discussions of 
our approaches, pragmatic tips, and references 
to the resources that we used. 

In reviewing the work done by the DIL 
project teams, we saw a natural progression of 
activities taken by each team. The stages of de- 
veloping a DIL program are visualized in Fig- 
ure 9.1 and are used to structure this chapter. 
Of course, developing a DIL program is not 
a totally linear process and we found that the 
stages built on one another in many intercon- 
nected ways. However, the figure and the struc- 
ture of this chapter are meant to be illustrative 
of a general approach that could be applied by 
academic librarians. 

Figure 9.1 shows each of the stages of de- 
veloping a DIL program: planning, developing 
the program, implementing, and assessing and 
evaluating. You may find, as we did, a need to 


e Identify partners 
e Raise awareness 


Plannin 
g + Gather information 


e Allocate resources 
* Develop a curriculum 
* Build content 


Developing 
the Program 


e Choose an approach 
: e Deliver instruction 
Implementing 


the Program 


© Use assessment constructively 
e Plan for sustainability 
e Share successes 


Assessment 
and Evaluation 


Figure 9.1 Stages of developing a data information 
literacy program. 


move back and forth between the stages: re- 
tracing, reconsidering, and cycling through 
the tasks within a stage several times. In the 
sections that follow, we outline the activi- 
ties performed in each of these stages. For the 
planning stage, we have grouped together the 
information-gathering and awareness-raising 
activities that most often occur early in the 
process. For the development stage, we discuss 
actions necessary to develop the program, such 
as building the curriculum and the content in 
response to the needs identified in the plan- 
ning stage. For the implementation stage, we 
pull together information about different ap- 
proaches and issues that you may encounter in 
the process of delivering instruction. The ap- 
proach that you choose may be determined by 
the needs identified in the planning stage, as 


well as by the types of content that you have 
chosen to address. For the final stage, assess- 
ment and evaluation, we provide information 
on using assessment to inform future iterations 
of your program and tools for planning for sus- 
tainability so that your DIL program continues 
to grow and flourish. 


Why Are Librarians Teaching DIL Skills? 


The DIL project teams identified a strong de- 
sire for support in data management skills. 
The academic library community identified 
data curation as a top trend in 2012 (Tenopir, 
Birch, & Allard, 2012). This area of support 
provides an opportunity for libraries to gain 
entry into the research life of students and 
faculty. Having librarians teach data manage- 
ment skills is advantageous for many reasons. 
First, many academic librarians have a broad 
understanding of scholarship in general and 
an in-depth understanding of disciplinary 
best practices in scholarly communication. 
Librarians have the ability to identify and 
recommend resources, tools, and even skills 
that researchers need but may not be aware 
of. Librarians have experience and skills in the 
organization and dissemination of a variety of 
materials that may be applied to data manage- 
ment. It takes time, energy, and a fair amount 
of professional development to take on these 
roles, but doing so can result in new depths 
of involvement in the research mission of the 
academy and new partnerships with faculty 
and graduate students. 


PLANNING 


This section contains advice on identifying part- 
ners (e.g., faculty, fellow librarians, other cam- 
pus service providers), raising awareness of the 
importance of DIL, and gathering information 
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so that the DIL program is informed by the real 
needs of constituents. 

In times of tight budgets and limited re- 
sources, it is important to invest time and en- 
ergy strategically. As a library begins to develop 
its DIL program, it must align these activities 
with other information literacy and research 
support programs offered locally. For example, 
it might better serve the students, as well as 
you and your colleagues, to combine similar 
instructional approaches or integrate outcomes 
throughout a curriculum, as opposed to run- 
ning parallel programs. Integrating and map- 
ping the DIL outcomes into existing frame- 
works and assessment already taking place will 
reveal natural affinities between these informa- 
tion skill sets and will help ensure buy-in across 
library and campus partners. 

The integration of programs is also more ef- 
ficient for teaching and scheduling of library 
staff. Collaborating with fellow librarians and 
others who are already teaching related skills is 
the most efficient way 
to establish a new DIL 
program. In fact, in 


The partnerships forged 
by the DIL project 
any environment, fel- teams with faculty 
low librarians are one members generally 
of the most important relied on existing 
resources in the devel- relationships. 
opment and imple- 
mentation of a DIL program. Collaboration 
can take many different forms: some librarians 
may be interested in co-teaching, others may 
adjust existing materials to fit their needs in- 
dependently, while some may want only to be 
informed about progress. Whatever the form, 
collaboration with librarians is one of the surest 


ways to establish and grow your program. 
How to Identify Collaborators 


Although data management and curation are 
topics that are often systemic to all disciplines, 
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many faculty members may not see it as a press- 
ing need to address or have the time to teach 
their students about DIL topics. Even if they 
are aware of the need, they may not be pre- 
pared to work with the library to address this 
topic to due misconceptions regarding library 
roles. Faculty can have a range of perceptions 
of the value that library and information sci- 
ence can bring to their laboratory or classroom. 
Given these and other differences among fac- 
ulty, it can be challenging to identify potential 
faculty partners. However, the following strat- 
egies assisted the DIL project and may have 
some value for other institutions. 


Low-Hanging Fruit 

The “low-hanging fruit” strategy centers on le- 
veraging your existing social capital: the con- 
nections that you have made with faculty and 
students through providing services to them. 
The people with whom you already have an es- 
tablished working relationship are more likely 
to be open to hearing your pitch on DIL. The 
partnerships forged by the DIL project teams 
with faculty members generally relied on exist- 
ing relationships. The route to partnerships can 
vary, but may look like this: 


e An existing need is identified by the pro- 
fessor. The partnership can begin with in- 
struction sessions for a specific course or 
to meet specific educational needs of the 
professor’s students that may or may not 
include DIL skills. 

e A need is identified by the librarian. In 
the case of the Carlson and Sapp Nelson 
Purdue team, Megan Sapp Nelson had 
identified issues in student work from 
previous contact with students. This team 
presented these issues to the course fac- 
ulty members (who concurred) as a part 
of generating interest in DIL. 


Coming to an Understanding 

“Coming to an understanding” refers to the 
progression of these initial opportunities to 
norming conversations during which the disci- 
plinary faculty member and the librarian come 
to a consensus that there is an issue that needs 
to be addressed, define what that issue is, and 
begin to develop strategies to address it. In the 
DIL project, this was accomplished primarily 
through interviewing faculty and students. 


Working Relationships 

“Working relationships” are developed as work 
with DIL progresses beyond informal conver- 
sations. This process generally happens as the 
librarian works closely with the disciplinary 
faculty members, asking questions and mak- 
ing suggestions. Needs identified may shift and 
potential strategies for addressing needs are 
not always realistic. Our teams struggled with 
time constraints as well as other challenges as 
the DIL project progressed. As with any inter- 
disciplinary project, several meetings are often 
spent identifying common ground, as well as 


DATA INFORMATION LITERACY 
INTERVIEW TOOL 


This project used a standardized tool, the Data 
Information Literacy Interview Instrument 
(http://dx.doi.org/10.5703/1288284315510), 
to have a structured conversation around DIL 
needs of the students and faculty member 
partner. Our goal was to understand how data 
management and curation was practiced by the 
research team members and to identify areas of 
need as seen by the students and faculty in the 


lab. The structured interview encouraged profes- 
sors to think carefully about issues of data man- 
agement, and it allowed librarians to introduce 
the DIL competencies to faculty in order to find 


out analogous disciplinary terminology. These 
interviews helped us identify the most serious 
needs as perceived by the faculty member. 


identifying differences and possible roadblocks. 
Those who identify and develop workarounds 
for differences or roadblocks early have an ad- 
vantage with regard to long-term success. 


How to Promote Data Information 
Literacy (Raising Awareness) 


For the faculty and others in the lab, commit- 
ment to data management and curation within 
a research team is not without impact on re- 
sources. At the very least, time is invested in the 
learning and practice of new skills. New tools or 
technologies may be needed, which bring asso- 
ciated hardware, software, and time costs. The 
faculty and other stakeholders must see a com- 
pelling reason to invest scarce resources into de- 
veloping and engaging in a DIL program. 

These are arguments that can be made that 
may have an impact on a DIL projects disci- 
plinary professors: 


We can help you improve the data management 
practices among your current students. 
Decreased errors, more efficient use of 
time, reduced frustration, and easier 
data sharing between project partners 
are just a few of the benefits of having 
graduate students who receive training 
and are familiar with data management 
best practices. 

We can help ease the transitions between gradu- 
ate students who work with the same data. 
Better documented and organized data 
can ease the transition between gradu- 
ate students and has a direct impact on 
time spent by the professor searching 
for or recreating data from work done 
by a former graduate student. It also 
enhances professional reputation when 
students graduating from a particular 
research group have these skills. 
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MAINTAINING FLEXIBILITY TO SUSTAIN 
WORKING RELATIONSHIPS 


The goal of the project was to create a tailored 
solution for each individual faculty member and 
research group. In the process, disciplinary or situ- 
ational constraints were identified to provide an 
educational intervention that met the needs of the 
students and faculty partners. In the case of the 
team from the University of Oregon, the research 
group was completing their work on a grant. 


Therefore a significant obstacle was the very short 


time frame available before the project ended. By 
being flexible, the team was able to create inter- 
ventions that addressed the faculty researcher’s 
needs and time constraints while meeting the 
goals of the DIL project. 


We can improve your projects compliance with 
[insert funder]’s mandates regarding data 
management plans. Funding agencies 
increasingly expect or require data to 
be managed and shared. The decrease 
in available grant funding in recent 
years makes even slight differences in 
the quality of proposals extremely im- 
portant. A thorough and thoughtful 
data management plan (DMP) helps to 
support a case for the reliability of the 
group proposing the research. 

We can help you increase the impact of your 
research. Emerging data journals and the 
use of DOIs to permanently connect ar- 
ticles to data sets mean that professors 
now have the ability to track citations to 
their data as well as their articles. Stud- 
ies have shown that the publication of 
data sets along with articles increased 
citations (Piwowar, Day, & Fridsma, 
2007). Highly cited data sets may help 
to support the tenure and promotion of 
a researcher. 

We can help provide open access to your data 
via sharing in repositories. For professors 
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ARGUMENTS FOR ENGAGING IN 
DATA INFORMATION LITERACY 


e We can help you improve the data manage- 


ment practices among your current students 
We can help ease the transitions between 
graduate students who work with the same 
data 

We can improve your project’s compliance 
with your funder’s mandates regarding data 
management plans 

We can help you increase the impact of your 
research 

We can help provide open access to your data 
via sharing in repositories 


concerned with the high costs and re- 
stricted access to scholarly journals, 
providing data through an open source 
repository represents another level of 
service to the profession and a way for 
researchers to enable long-term access 
to their data. 


Each of these points may be effective for 
some faculty members. The level of impact 
may be dependent upon rank, professional 
obligations, disciplinary expectations, and per- 
sonal opinions and habits. For an untenured 
assistant professor, the highest priority may be 
to create a strong case for tenure. Therefore, 


WHAT WORKED FOR US 


In our case, the actual approach for recruiting 
faculty to the DIL project often started from a 
reflective conversation during which disciplin- 
ary professors considered how well their students 
managed data. In nearly every case, the professors 
expressed some serious concerns and needs around 
the data practices of their students—concerns that 
the professor did not have the time or expertise to 
address. The goal was then to convert an observed 


need(s) into an educational program targeted to 


address the need(s). 


emphasizing the impact that can result from 
publishing data could be an effective tool. 
For a full professor with a long-established re- 
search history, the argument for safeguarding 
the knowledge that they gathered throughout 
their career and making it available for future 
use may be more compelling. Getting to know 
the faculty members’ priorities before having a 
conversation will allow the librarian to select 
the approach most likely to succeed. 


Understanding the Needs 
of Constituencies 


A key component to success with the DIL proj- 
ect was developing an understanding of any 
disciplinary norms with data and incorporating 
these norms into our educational programming 
wherever possible. Conducting an environmen- 
tal scan will provide baseline knowledge that can 
help you develop your educational program. We 
had success reviewing the scholarly literature of 
the discipline along with reports, websites, and 
other relevant materials produced by organiza- 
tions or agencies affiliated with the discipline. 
In addition, conducting an environmental scan 
of local data management and curation prac- 
tices will familiarize you with disciplinary atti- 
tudes and behaviors. Spend time learning about 
practices in the department through identifying 
related resources such as courses on research 
ethics, training for graduate research assistants, 
or more informal manuals of practice available 
on department websites. 

Our suggestions for performing an environ- 
mental scan include the following: 


1. Perform a literature review in your dis- 
cipline. This might reveal published best 
practices for the specific subject area. Lit- 
erature may come from the disciplines 
themselves or from publications in the 
library science field. 


2. Perform an internet search for data man- 
agement best practices in your discipline. 
Pay special attention to results from rel- 
evant disciplinary societies and institutes. 

3. Know the funding agencies and organiza- 
tions in your discipline and whether data 
management requirements exist. 

4, Search for disciplinary data repositories 
to learn what types of requirements and 
guidelines they provide. Some reposito- 
ries, such as ICPSR (2012) or the UK 
Data Archive (Corti, Van den Eynden, 
Bishop, & Wollard, 2014), published 
guidelines for managing data in ways that 
support their eventual curation. 

5. Find journals in the field that include 
data supplements and look at examples 
of archived data sets for ideas for culti- 
vating best practices. Some journals have 
requirements for open data. 

6. Identify professional organizations re- 
lated to the discipline. This may be use- 
ful as more programmatic approaches to 
data management evolve. 


There are many potential places to look for 
information, and as interest in data manage- 
ment continues to grow, the amount of infor- 
mation will increase. Some fields are further 
along than others and therefore have a much 
greater body of literature and online resources 
associated with data management. In our case, 
the teams focusing on ecology and related sub- 
jects found more information than the teams 
focusing on engineering. 

You may want to increase the scope of the 
environmental scan beyond disciplinary norms 
and include resources at your institution. Ques- 
tions that you may want to ask include 


e What are the specific resources relating to 
data available to researchers at your insti- 
tution? 
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ENVIRONMENTAL SCANNING IS THE 
WAY TO YOUR FACULTY’S HEART 


Early in the process, each of the DIL teams set out 
to increase our understanding of our respective 
disciplines by conducting an environmental scan 
of the discipline. Our intention was to identify 
how each discipline recognized, discussed, and 


addressed research data management and cura- 


tion issues. As expected, the quantity and quality 
of the materials found by each team varied, but 
every team was better informed in their interac- 
tions with faculty and students. For example, as a 
result of preliminary searching in the library cata- 
log, the Cornell librarian team member brought 
a book published by The Long Term Ecologi- 
cal Research Network on data management to 
a meeting with the faculty. The faculty member 
had worked with one of the authors and was very 
interested in reading the book. The other DIL 
teams had similar experiences and found that fac- 
ulty appreciated the librarians’ ability to find per- 
tinent disciplinary information and bring these 
materials to their attention. 


DISCIPLINES AND THE DATA INFORMATION 
LITERACY COMPETENCIES 


We hypothesized from the beginning that re- 
searchers from different disciplines would inter- 
pret the competencies differently, due to special- 
ized practices or cultural norms. This proved to be 
true in each of the five DIL case studies. However, 
we also found different data practices within the 
subfields of disciplines or even among individual 
projects. For example, though civil engineering 
as a discipline is still considering how to respond 
to challenges in managing and curating data, the 
University of Minnesota team partnered with a 
research group that was affiliated with the Net- 


work for Earthquake Engineering Simulation 
(NEES). NEES has an online virtual research 
platform, NEEShub.org, that includes a data 
repository. The University of Minnesota team 
reviewed materials produced by NEEShub and 


incorporated them into their educational pro- 
gram, and vice versa. NEEShub.org incorporated 
a version of the team’s instructional materials for 
its online educational offerings. 
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e What are local practices and attitudes 
with regard to data management? 

e What are the strategic priorities for your 
institution? 

e What potential barriers do you foresee? 

e What resources (e.g., people, skills) would 
you need to consider or include in your 
program to be successful? 


Conversations with additional stakeholders 
may allow for the identification of additional 
needs and factors or clarify possible responses 
to include in your DIL program. ‘The environ- 
mental scan may help you to identify potential 
collaborators as well. 


Understanding and Working With Faculty 
Collaborating with faculty is often both chal- 
lenging and very rewarding. Faculty are busy 
people, so it can be difficult for them to find 
the time to focus on a collaboration like this. 
In addition, their attention may be divided 
among research, teaching, and administrative 
duties. Ideally, you will work with faculty to 
identify the needs of the students and to deter- 
mine the timing and means of delivering the 
instruction. You will need to work together to 
determine what skills you can reasonably ad- 
dress. Most likely neither of you will have the 
expertise to address all of the students’ needs, 
but you will have complementary skills and can 
bring in outside experts as needed. 

The time needed to address all of these is- 
sues will vary with the degree of involvement of 
the faculty and the scope of your program. At 
minimum, a substantive initial meeting to dis- 
cuss student needs, timing, and means of de- 
livery will begin the process. If you are offering 
a one-session workshop, that may be all of the 
time you require for planning. However, devel- 
oping a project larger in scope, such as a series 
of classes or a mini-course, may require much 


RECOMMENDATIONS FOR 
WORKING WITH FACULTY 


Based on our experiences working with faculty 
collaborators on the DIL project, we offer these 
recommendations: 


e Be prepared for faculty attention levels to shift 
as the project progresses; their focus on the 
project may ebb and flow depending on other 
commitments. 

Have clearly defined expectations and roles 
going into the project. However, be flexible if 
those expectations and roles must change over 
the course of the project. 

If you need faculty input at a certain time, or 
require that a certain amount of faculty time 
be allocated to the project, make those needs 
clear and make sure that the faculty member 
can make those commitments. You may even 
want to specify these needs as a statement of 
support in writing. 

Faculty (and students) often dont understand 
the language used in libraries. For example, 
the terms data curation, data management, 
and metadata may not resonate with them. Be 
prepared to translate and speak the researcher's 
language. 

The faculty member has extensive knowledge 
of the discipline; use that expertise to provide 
context and rich examples for the students. 


This is key in engaging the students in the 


topic. 


more time to discuss and plan course content 
and delivery. 


Understanding and Working 

With Graduate Students 

Graduate students are an important constitu- 
ency for academic libraries. They are often at 
the research frontline, not only in data collec- 
tion, processing, and analysis, but also in man- 
aging, describing, and documenting research 
data. In our experience, graduate students gen- 
erally receive minimal training to take on these 
important tasks. Working with graduate stu- 
dents to develop and implement educational 


programming is a way for librarians to address 
a critical need of students and faculty, and a 
way to build or strengthen connections with 
this important user group. 

In planning and developing instruction in 
DIL competencies for graduate students, you 
must gain an understanding of their environ- 
ment and their needs from their perspective. 
Graduate students often engage in multiple 
roles: student, member of a research project, in- 
structor, and so forth. The nature and intent of 
the educational programming that you develop 
will shape your interactions with the gradu- 
ate students that you intend to target. Plan to 
spend some time talking and interacting with 
the graduate students you are targeting. We 
have found that graduate students’ interpreta- 
tions of their environment, roles, and perceived 
needs often vary greatly from those expressed 
by their faculty advisor. Graduate students will 
likely provide you with a more nuanced and 
complete understanding of how DIL compe- 
tencies are acquired and practiced, as well as 
how you could respond to any gaps. 

Although there are likely to be differences 
in the lives of graduate students according to 
their discipline, area of research, institution, 
and so forth, we found the following elements 
to be true of most of the graduate students we 
worked with in the DIL project: 


e Graduate students are busy people. They 
are both learning their discipline and 
taking on professional responsibilities 
through teaching, research (their own 
plus supporting other research activities), 
and engagement. This leaves very little 
time for things that are interpreted as be- 
ing something “extra” for them to do. 

e Graduate students are under a lot of pres- 
sure. Not only do graduate students take 
on a lot of responsibilities, but they are 


Developing DIL Programs CHAPTER 9 213 


RECOMMENDATIONS FOR WORKING 
WITH GRADUATE STUDENTS 


Although many factors complicate making con- 
nections and working with graduate students on 
developing data competencies, it can be done. 
There are several key considerations in planning 
and developing educational programs for this 


population: 


In developing your DIL program, don’t just 
focus on the faculty, but take time to connect 
with the graduate students. Try to get a sense 
of what they already know and what they 
perceive as important in working with data. 
Be prepared to articulate how your program 
will address their needs, both in the future and 
in their current situations. 

Recognize that they are busy people and try 
to meet them where they are. This could 
mean getting time in an existing meeting or 
embedding yourself in existing structures. 
It may also mean that you work with them 


outside of a regular workday. 


Set realistic expectations, both for graduate 
students and for your program. You may not 
be able to do everything that you would like 
to do right away. Give your program a chance 
to develop over time and give yourself room to 
be successful. 


under pressure to produce results quickly. 
As one faculty member told us, graduate 
students have to do three things: find a 
research project to join, produce results 
that they can claim credit for, and gradu- 
ate. Anything else may be seen as detract- 
ing from what graduate students must 
accomplish as students. 

Graduate students are expected to be “in- 
dependent learners.” They have reached 
a stage in their educational career where 
they are expected to be able to formulate 
and conduct their own research, develop 
and teach their own courses, and to pro- 
duce presentations and publications that 
favorably compare to those of veteran 
researchers. Although they are certainly 
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willing to help when needed, faculty 
mentors generally expect graduate stu- 
dents to be able to address questions and 
problems on their own, without detailed 
instruction. 

e Graduate students are presumed to have 
already learned DIL skills. When asked 
how graduate students acquire knowl- 
edge and skills with data generally, the 
faculty we interviewed believed that stu- 
dents had acquired them through previ- 
ous course work or other experiences as 
undergraduates. For many graduate stu- 
dents this was not the case. They lacked 
previous experience in working with data 
and acknowledged to us that they were 
acquiring their skills as they went along. 

e Graduate students may have a “short- 
term mentality.” One sentiment from 
faculty that we heard frequently in our 
interviews was that graduate students did 
not have a sense of the lasting value of 
the data that they were producing and 
therefore did not always recognize the 
need to treat their data as an institutional 
asset. Because graduate students lack the 
experience of using data beyond what 
was originally envisioned, faculty found 
it tough to convince them to take better 
care of the data that stay behind long af- 
ter they graduate. 


DEVELOPING THE PROGRAM 


Once youve established partners and deter- 
mined needs, it’s time to develop your pro- 
gram. This section contains advice on allocat- 
ing resources (time, money, expertise, and so 
forth), developing a curriculum in response to 
local interest and needs, and crafting the mate- 
rials that you'll teach. 


Available Resources 


When looking toward implementation, you 
should consider the resources available to you. 
What time, money, and expertise will you need 
to carry out the program, and how well do those 
match the resources available at your institu- 
tion? What technology do you need? Do you 
need additional training? Where will you teach? 
At what scale should you be planning your DIL 
program? The answers to these and other ques- 
tions should be derived from the information 
you gathered during the planning stage. For 
example, online educational resources are often 
the most scalable, but there is a substantial up- 
front cost in developing modules, such as gain- 
ing expertise (or hiring others) in using online 
learning technologies. On the other hand, in- 
person instruction demands can rapidly outstrip 
the available time of instructors. The services or 
resources that you previously identified that ad- 
dress similar needs may help you to make deci- 
sions. It may be easier to use the same technol- 
ogy others are using and substantially decrease 
the cost of developing online resources to the 
point that it becomes more sustainable than do- 
ing in-person instruction. If your organization 
has a centralized information literacy program, 
coordinating with it may be part of your plan, 
particularly if your program extends across sub- 
ject liaison areas. As you are working to flesh 
out the framework of your DIL program, it is 
essential to determine whether collaborators are 
on board and what times would work best for 
them. What are they able to commit to? Re- 
cruiting other librarians to collaborate on creat- 
ing programmatic instruction will help spread 
the work around as well, but again, influencing 
others to be a part of the program will likely 
be necessary. Do you have enough buy-in from 
collaborators to develop the DIL program at the 
scale you would like, or do you need to adjust 


your expectations to better match the available 
resources? Available resources will have a large 
effect on the scale and scope of the program that 
you are able to develop. 


Developing a Curriculum 


Developing an effective and successful cur- 
riculum starts with learning the needs of your 
constituencies to determine which learning 
goals and outcomes will most resonate with, 
and benefit, your students. Whether using a 
structured interview tool, such as the Data In- 
formation Literacy Interview Tool), or a tool 
developed in house, such as a quick survey sent 
to students to identify pain points and areas of 
interest, this feedback will help guide curricu- 
lum development. 

Upon completion of the needs assess- 
ment, the DIL instructor(s) will want to look 
through the identified topics of interest and 
begin to prioritize which topics to include 
in the curriculum. Some factors to consider 
include the length of instruction, the mode 
of instruction (e.g., online videos, in-person 
workshop), and whether any prerequisites 
should exist (i.e., do students need to have 
some baseline skills?). You should also con- 
sider your own areas of knowledge and ex- 
pertise; certain skills may fall outside the in- 
structors’ skill sets. Will you not include these 
skills, or will you recruit outside experts? For 
example, the University of Minnesota team 
found a user need in skills related to data 
visualization and analysis. Not confident in 
teaching these discipline-specific areas, the 
team incorporated campus resources where 
students could get more expert assistance in 
these areas, such as training on statistical tools 
and advanced Excel techniques. 

At this point in the process it can be valu- 
able to bring stakeholders (faculty, research 
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DEVELOPING THE CURRICULUM 


With the information gained from interviews with 
faculty and graduate students, each DIL team set 
out to identify instructional interventions to ad- 
dress the gaps that we found in the graduate cur- 
riculum covering data management skills. How- 
ever, each team found that students and faculty 
expressed a potential interest in receiving training 
around almost every one of the DIL competencies. 
The Cornell team, for example, used the following 
questions to help them narrow down which of the 
competencies to focus on: 


e Does the competency address a gap we found 
in the curriculum? 
Do we have the expertise to address the need? 
If not, could we bring someone else in who 
does have the expertise? 

e Where could we add the most value? 


After answering these questions in concert with 
the disciplinary faculty, the team decided to fo- 


cus on these four DIL areas: data management and 
organization, data analysis and visualization, data 
sharing, and data quality and documentation. 


advisors) into the instructional planning to 
act as sounding boards for the proposed goals 
and outcomes of your instructional interven- 
tion. These conversations can act as reality 
checks to make sure that that information 
gathered and decisions made regarding which 
skills to cover align with faculty goals. These 
conversations also assist with managing stake- 
holder expectations, providing these key play- 
ers with a sneak preview (and opportunity to 
provide feedback) before the instruction is 
implemented. 

Instructors also need to be realistic in their 
expectations of student comprehension when 
determining the scope of information that 
can be effectively conveyed and successfully 
transmitted in the time allotted. For many 
students, this curriculum will be their first in- 
struction in DIL, so they will need time to 
orient themselves. 
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Developing Outcomes 


After determining which DIL competencies 
are most critical to the target audience, the 
next step is developing learning outcomes for 
instruction. “Learning outcomes are state- 
ments of what a successful learner is expected 
to be able to do at the end of the process of a 
learning experience such as the course unit or 
the course model” (Gogus, 2012b, p. 1950). 
Although the terms are often used interchange- 
ably, learning outcomes are distinguished from 
learning objectives as they tend to go beyond 
the general aims or goals of the instruction to 
the resulting expectations and evidence of what 
a student knows or can do after instruction 
(Gogus, 2012a). Without a clear idea of exactly 
what students need to learn or accomplish, it is 
difficult, if not impossible, to design effective 
instruction or assess whether or not it is suc- 
cessful; therefore, specifying learning outcomes 
is an essential first step. 

Good learning outcomes are specific, mea- 
surable or observable, clear, aligned with ac- 
tivities and assessments, and student centered 
rather than instructor centered. They also 
specify the criteria for and the level of student 
performance and begin with action verbs. 
Bloom’s taxonomy is an excellent source for ac- 
tion verbs and is widely used as an educational 
tool for classifying goals and outcomes (Gogus, 
2012a). 


Planning for Assessment 


Assessment should be considered early in the 
DIL program planning process, even before 
designing your instruction. However, many 
librarians are not assessment experts (includ- 
ing most of us in the DIL project), which may 
make the idea of assessment somewhat daunt- 
ing. Fortunately, we found that by keeping a 


TRANSLATING DIL COMPETENCIES 
INTO LEARNING OUTCOMES 


The DIL competencies identified by Carlson, Fos- 
mire, Miller, and Sapp Nelson (see Chapter 1 of 
this book) were a useful starting place for gener- 
ating and refining the specific learning outcomes 
for our educational programs. In order to turn a 
DIL competency into a learning outcome, we re- 
placed the more vague terms such as familiarize 
or understand with more action-oriented verbs 
such as locate or define. We described each learn- 
ing objective as follows: learning area, preliminary 
outcomes/objectives/learning goals, and possible 
pedagogy. For example, from the broad compe- 
tency theme of discovery and acquisition of data, 


the following learning outcomes are possible: 


e Evaluate disciplinary data repositories (from 
a given list according to particular criteria) in 
order to determine requirements and suitability 
for data deposit. 

Find and evaluate the quality of a data set in 
order to decide whether it would be of use. 


Note that these learning outcomes are more spe- 
cific, measurable or observable, and clear than the 
more generic and unmeasurable “become familiar 
with data repositories in the discipline.” Also note 
that many different learning outcomes could be 
developed to address the broad competency. 


few basic principles in mind, assessment was 
not so foreign. Plus, specifying learning out- 
comes is the first step in planning for assess- 
ment, so if you are following along, you have 
already started defining your assessment with- 
out realizing it. 

Assessment of student learning is the process 
of understanding what participants know and 
can do in relation to the outcomes that you are 
trying to achieve. It is not enough to say we 
have covered certain DIL topics and now our 
work is done. Without getting feedback from 
our students and seeing if it measures up to 
our criteria for success, it is impossible to know 
whether students have learned DIL skills and 
are able to apply their knowledge or transfer it 
to other situations. 


‘There are two types of assessment—forma- 
tive and summative—each with several levels 
of assessment: institutional, program-level, and 
instruction-session level. Formative assessment 
occurs during the instructional process and 
provides feedback for the students on how they 
are doing. This lets instructors know how well 
students are receiving the instruction early on 
and allows for course corrections and clarifica- 
tions. Summative assessment occurs after the 
instruction concludes and provides measures of 
how well the instructional outcomes have been 
achieved and the efficacy of the instruction. 
Classroom (or instruction session) assessment 
of student learning is outcomes based and fo- 
cuses on what students demonstrably know and 
can do after instruction. This may include mea- 
sures such as examination of final assignments 
or projects using a rubric (a defined standard 
of performance) as well as pre- and post-tests. 
Program-level assessment is discussed in the 
final section of this chapter, “Assessment and 
Evaluation.” 

The more opportunities that students have 
to practice and receive direct and timely feed- 
back on their skills, in accordance with specific 
criteria and in situations they will encounter in 
the real world (or as close as possible), the more 
likely that learning will be achieved (Radcliff, 
2007). This can happen in a number of ways. 
One of the most effective is course-integrated, 
outcomes-based assessment that uses learning 
outcomes as the goals to measure student ac- 
complishment. Planning activities that allow 
students to practice their newly learned skills 
and to get or give feedback can help you as the 
instructor gauge whether students are grasping 
the concepts taught. 

Assessment, particularly outcomes-based as- 
sessment, can be one of the most challenging 
parts of developing an educational program. 
However, you may already have resources avail- 
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THERE ARE MANY PATHS 
TO STUDENT ASSESSMENT 


The DIL teams used several student assessment 
methods. The following examples show that suc- 
cess can be achieved using a mix of both formal 
and informal, and formative and summative, 
methods when measuring student success. 


Formative: Several teams (both Purdue teams and 
the University of Oregon team) conducted for- 
mative assessments that included informal ex- 
aminations of student-created materials, feed- 
back on in-class exercises, or “1-minute paper” 
reflections to gauge students’ learning in order 
to give feedback and make course corrections. 


Summative: The Carlson and Sapp Nelson team 
from Purdue created a rubric to allow them- 
selves and the TAs to judge the quality of stu- 
dents’ code. They also examined students’ final 
design notebooks and attended the students’ 


final design reviews to give feedback. 

Combination of formative and summative: Two 
teams, Cornell and the University of Min- 
nesota, had students complete data manage- 
ment plans (DMPs) in successive sections to 
provide formative assessment and then used 
the final DMPs for summative assessment. 
They both assessed the DMPs according to 
the criteria on a rubric, which can be found 
in the team chapters. 


able at your campus (such as research and as- 
sessment units, survey centers and tools, and 
so forth) to assist you in crafting a workable 
approach for assessing your DIL program. If 
you are just getting started in crafting your DIL 
program, you may want to begin by employing 
lightweight methods, like the 1-minute paper, 
to make the process of assessment less onerous. 
‘The 1-minute paper exercise is a way to quickly 
check for students’ self-reported understand- 
ing or confusion. The method is simply a very 
short in-class writing activity that can be com- 
pleted in 1 minute or less, by asking students to 
respond to a question designed to provide the 
instructor with feedback about their learning. 
For example, a popular set of questions to ask 
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at the end of a session for a 1-minute paper is 
(1) What have you learned? and (2) What do 
you still have questions about? Although it is 
common to use open-ended and reflective ques- 
tions, 1-minute papers are very adaptable and 
can be used for a variety of purposes depending 
on the questions asked. As you become more 
comfortable with your DIL program, you can 
add in more advanced approaches to ascertain 
its impact on student knowledge and behaviors 
with DIL topics. 


Building Course Content 


Once you've determined what to teach, devel- 
oping your teaching materials is the next step. It 
is critically important to capture the attention 
of the students early and often, and you should 
build your instructional content with this in 
mind. Successful engagement of students can 
be accomplished by tailoring your instruction 
specifically to their needs and situations. One 
approach would be to solicit real-world stories 
of data loss or error as a result of less than ideal 
data management. Sharing these stories early 
on in your DIL program may capture students’ 
attention. Ideally these stories come from fac- 
ulty (or fellow students), which makes the 
impact these losses have on work real. These 
stories are unfortunately all too common, so 
finding them should not be difficult. Drawing 
students into the topic at the outset and facili- 
tating their buy-in will pave the way for a suc- 
cessful instructional session. 

The content of your session can be delivered 
in a variety of ways. Two common approaches 
are the instructor-led lecture on specific topics, 
and implementing active learning—based activ- 
ities that allow students to get hands-on experi- 
ence with some specific data practices. (More 
information on active learning can be found 
on the University of Minnesotas Center for 


Teaching and Learning website: http://wwwl 
.umn.edu/ohr/teachlearn/tutorials/active 
/what/.) Strategies for approaching instruction 
are discussed more fully in the next section, 
“Implementing the Program,” but the content 
will need to align with the instructional ap- 
proach (e.g., it might be hard to incorporate 
active learning into an online video tutorial), 
the learning styles of the intended audience, 
and the comfort level of the instructor. Gener- 
ally it is a good idea to have your students use 
the data sets that they themselves are responsi- 
ble for developing or managing in your lessons 
and activities. This reinforces the learning ob- 
jectives as the students have a real investment 
in the results of the lesson. However, it’s im- 
portant to recognize that not all students will 
come to your training with their own data sets, 
or they may be working on data sets that are at 
different stages of development. In these cases, 
you may need to provide fictional scenarios to 
give students something with which to work. 
Care should be taken in developing fictional 
data so that it reflects the attributes and charac- 
teristics of data that would normally be found 
in the student's field of study. 

The resources that you are teaching students 
to use may also drive your instructional con- 
tent creation. For example, if teaching about a 
particular data repository (either institutional 
or subject-based), you will want to make sure 
that the content you create matches the re- 
quirements of that repository. In fact, looking 
at relevant data repositories may be a good way 
to determine which pieces of content are the 
most important to cover for a particular audi- 
ence—for example, metadata standards used 
by the repository, policies for preservation, and 
perhaps even licensing concerns. 

As more librarians and faculty develop 
educational programs and materials for teach- 
ing DIL competencies, more ready-made 


LEARNING OUTCOMES AND THE 
INSTRUCTIONAL DESIGN PROCESS 


In the DIL project, the use of learning outcomes 
helped us design a clear picture of the intended 
results and therefore helped to guide the develop- 
ment of our instruction activities. For example, 
for the learning outcome “evaluate disciplinary 
data repositories in order to determine require- 
ments and suitability for data deposit,” the Cor- 
nell team knew they would need to provide stu- 
dents with a list of potential repositories in their 
field (using Databib to identify them: http:// 
databib.org/) as well as a list of potential evalua- 
tive criteria for data deposit. They then designed 
an in-class exercise that had students examine at 
least one repository in their subject area accord- 
ing to the evaluative criteria to see if they would 
recommend depositing their data (an activity that 
is grounded in their real-world practice) and then 


report back to the whole group with their findings 


(allowing the instructor to assess their evaluation 


and give feedback). 


instructional content is becoming available. 
Not all resources may be appropriate to use 
when developing and implementing your own 
DIL program, but they might be worth in- 
vestigating as they could spark some ideas for 
your specific context and audience. See the 
end of this chapter for a list of resources that 
may help you design your own DIL program. 


IMPLEMENTING THE PROGRAM 


Finally! Youve worked so hard to get to this 
point, which is perhaps the most exciting stage 
of the process, and youre ready to teach. This 
section contains advice on choosing and deliv- 
ering an instructional approach, whether it is 
in-person instruction for small groups, instruc- 
tion aimed at a large online audience, or some- 
where in between. 
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Choosing an Approach 


Once goals and outcomes of the course are 
determined you will need to develop instruc- 
tional approaches or pedagogies (lectures, 
learning activities, online videos, and so forth) 
that lead students to successfully acquire the 
knowledge and skills that you have identified 
as your learning outcomes. There are many 
possible instructional approaches that one can 
choose. The DIL teams each chose a different 
approach to fit the needs of their communities. 
Each of these approaches had associated pros 
and cons, listed in Table 9.1. 


e The Cornell team offered a one-credit 
mini-course on data management with 
sponsorship from the natural resources 
department. 

e The Carlson and Sapp Nelson team 
from Purdue took an embedded librar- 
ian approach with their project partner, a 
service-learning center based in electrical 
and computer engineering. They offered 
a skills session to introduce concepts and 
good practices, designed tools and re- 
sources to support their application, and 
attended design reviews and team meet- 
ings as consultants to help encourage and 
reinforce their adoption. 

e The Bracke and Fosmire team from Pur- 
due developed a checklist of practices for 
students in an agricultural and biologi- 
cal engineering group and then offered 
a series of workshops to teach the skills 
needed to carry out these practices. 

e With assistance from an instructional 
designer, the University of Minnesota 
team developed an online training mod- 
ule that they used in conjunction with an 
in-person session for a civil engineering 
research section. 


TABLE 9.1 


The Pros and Cons of the DIL Instructional Approaches 


Approach: 
mini-course 
(Cornell) 


Approach: 
online course 
(Minnesota) 


Approach: one- 
shot session 
(Oregon) 


Approach: 
embedded 
librarianship 
(Carlson and 
Sapp Nelson 
team from 
Purdue) 


Approach: 
series of 
workshops 
(Bracke and 
Fosmire team 
from Purdue) 


Pros 

Co-teaching the course with faculty from the department increased faculty engagement 

Course format provided opportunities to practice application of best practices, and the 
ability to build on prior sessions 


Cons 

Time investment is substantial, both for librarian and faculty collaborator 

Must have buy-in from university department to offer course, and from library 
administration to spend librarian time teaching. (Many libraries consider teaching 
a university course a high achievement for a librarian, so this may not be a con but 
should still be considered) 


Pros 

Very scalable. While initial time investment may be high, modules can be reused, 
repurposed, and recombined, increasing the potential impact of the training 

Online format provides the opportunity for students to reference the materials at the 
time of need, potentially resulting in improved data practices 


Cons 

May require assistance from an instructional designer, or someone with experience 
building online content 

Impact is increased by combining with an in-person session, due date, or some other 
kind of encouragement. Students tended to forget or put off completing the module 
until the last minute 


Pros 

Small group setting allows materials to be closely targeted to their specific needs, 
increasing student awareness of tools, resources, and best practices 

Because the time investment is small, increased likelihood of getting buy-in from 
reluctant or busy faculty and graduate students 


Cons 

Not very scalable if you have lots of research groups interested in such targeted 
training. 

Limited time with students may mean that material covered is not retained as well as it 
would be if there were more opportunities for activities and repetition 


Pros 

Group setting allows materials to be closely targeted to their specific needs, increasing 
student awareness of tools, resources, and best practices 

Ongoing relationship with students and faculty provides multiple opportunities for 
evaluating student work and providing feedback 


Cons 

Not very scalable as interest increases 

Time investment is large, since the librarian participates in group meetings and is 
closely involved in development of tools and resources 


Pros 

Workshops allow materials to be closely targeted to developing specific skills and best 
practices 

Clear expectations such as a checklist make it easier for students to see what is 
required, resulting in increased student compliance 


Cons 

Very specific outcomes (checklist of practices) may result in the students not realizing 
that the same best practices could apply in other situations 

Time investment is large, since librarian must develop specific checklist and 
accompanying instruction. May not be easily scaled if other groups want such 
targeted instruction 


e The University of Oregon team partnered 
with a group conducting research in ecol- 
ogy whose funding for the project was 
winding down. They offered a seminar 
to connect students to data management 
and curation resources developed by their 
disciplinary community. 


For a more in-depth discussion of each 
team’s DIL program, please refer to Chapters 
4 through 8. 


Delivering Instruction 


If youve taught other classes and workshops 
then you may be completely comfortable with 
being in front of a classroom delivering instruc- 
tion. If, however, youre new to teaching or are 
delving into new territory, perhaps developing 
online modules for the first time, then you may 
be feeling pretty nervous. The best solution for 
nervousness is preparation and practice. Once 
youve planned everything out, the next step 
may be practicing with colleagues, significant 
others, or even your dog (probably not helpful 
if youre developing online modules). To help 
you prepare, in this section we'll discuss some 
things you'll need to consider while you're de- 
livering instruction. 


Scheduling Sessions 

As discussed in the previous section, there are 
a wide variety of implementation possibilities, 
from online courses to embedded librarians and 
beyond. In the case of workshops or other train- 
ing sessions, it’s important to schedule them to 
coincide with research team availability. Ideally, 
DIL training would be integrated and coincide 
with a relevant part of the data life cycle, but 
this is not always possible. A research team may 
have individuals working with data from mul- 
tiple stages of the data life cycle. In general, it is 
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best to avoid conflicts with research field trips 
or conferences, and to target instructional in- 
terventions so that they are timed as closely as 
possible with real research workflows and events. 
As with any type of instruction, just-in-time in- 
struction that students can apply immediately 
will be more effective than instruction not tied 
to a recent or upcoming activity. If students have 
no opportunity to apply what is being taught 
(for example, they are new graduate students 
without data of their own), then you may want 
to incorporate more activities and opportunities 
for practice with fictitious data to help reinforce 
what is being taught. Note that schedules that 
work for research teams may not coincide with 
typical academic schedules. You may find that 
weekend workshops or winter sessions timed 
to avoid summer field research will work better 
than semester-based scheduling. 


Feedback for Students 


In the midst of planning the curriculum and 
collecting the content that you'll teach, it can be 
easy to forget to plan how you'll communicate 
with your students. Again, in our experience, 
different teams used different methods. For the 
online modules developed by the University of 
Minnesota team, communication was mostly 
via e-mail, some of it automated. For the mini- 
course taught by the Cornell team, the Black- 
board course management system's discussion 
board was used for collecting and providing 
feedback on assignments. The University of 
Oregon’s team as well as both of Purdue’s teams 
included plenty of time for direct student con- 
tact to provide feedback on their work. The 
method you choose should fit your comfort 
level and work for what you're trying to ac- 
complish. Formal feedback will require differ- 
ent methods than will informal feedback, so if 
youre assigning grades you will need a more 
formal system in place than if you're collecting 
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notebooks and writing a note or giving verbal 
feedback in a one-shot workshop. Regardless 
of the method you choose, feedback is an im- 
portant step toward maintaining the students’ 
interest and engagement and should not be 
overlooked. 


Maintaining Interest (Theirs and Yours) 

In the “Building Course Content” section, we 
suggested that you build instructional content 
with the goal of capturing the attention of the 
students early and often. Strategies for main- 
taining their interest include tailoring instruc- 
tion specifically to their needs and situations, 
and using active learning techniques when 
youre in a face-to-face environment. However, 
no matter how well you prepare, there will 
probably be a moment when you notice a stu- 
dent yawning, multitasking on e-mail, or not 
completing the online modules you worked so 
hard on. Remind yourself that graduate stu- 
dents are busy people, don’t take it personally, 
and then see if there’s anything that you can do 
to improve their experience. Note that if youre 
bored, your students are definitely bored. You 
cannot be an engaging teacher if you're not 
excited, so make sure you plan content that’s 
exciting to you too. The more excited or pas- 
sionate that you are, the easier it will be to draw 
students into the topic, and facilitating their 
buy-in will ensure a more successful program. 


Responding to Formative Assessment 

Formative assessment happens during the in- 
structional process and provides feedback for 
students on how they are doing, lets instructors 
know how students are receiving the instruc- 
tion, and allows for course corrections and 
clarifications. Assessment while you're teach- 
ing can help reveal whether or not students are 
learning, whether youre covering too much 
or too little, and when it may be necessary to 


make adjustments. Adjustments can be sub- 
stantial, such as adding a new class session or 
online module to provide additional informa- 
tion. They can also be small, like changing a 
due date to accommodate students’ schedules, 
providing additional help, or reviewing a topic 
at the beginning of the next class or meeting. 
Avoid asking for feedback about something 
that you are unwilling to change. So for exam- 
ple, if you or your faculty stakeholders require 
that a certain topic be covered, don’t ask stu- 
dents if they would rather skip it. If you do ask 
for feedback, let the students know how you'll 
be using it and why it’s important so that they 
can be properly engaged in the process. Most 
importantly, follow up to let the students know 
how you used their feedback to make improve- 
ments to the instruction. If assessment is han- 
dled this way, students are much more likely 
to continue to be active participants since they 
can see that they have a real effect on the way 
instruction is delivered. 


ASSESSMENT AND EVALUATION 


After completing your instruction, it is im- 
portant to assess and evaluate what worked 
and what didn’t. This section contains advice 
on using program assessment constructively, 
planning for sustainability, and sharing your 
successes in order to continue to grow your 
program. 


Making Good Use of 
Program Assessment 


Program-level assessment focuses on the effec- 
tiveness and reach of an instructional program 
as a whole. A sustainable program does not 
end once instruction has been delivered. Get- 
ting feedback from students and stakeholders 


helps you to determine what improvements 
and changes should be made. Program-level as- 
sessment can include measures of student sat- 
isfaction, self-reported skill attainment, teacher 
effectiveness, and program design (usefulness 
of activities, readings, and so forth). 


How Our Teams Used Program 
Assessment to Inform Next Steps 
Once the DIL teams offered their instruction, 
each team conducted a summative program 
assessment. Here again the specific metrics 
and approaches varied according to the team’s 
programmatic goals and objectives. Some of 
the teams distributed an evaluation survey for 
students to complete and return. Other teams 
conducted brief follow-up interviews with fac- 
ulty and graduate students to learn how their 
efforts impacted them and others. Still others 
collected student work from the class and ana- 
lyzed it for evidence that the students under- 
stood and were able to apply the concepts that 
they had learned in the program. The feedback 
we gathered helped us to plan the next steps in 
developing a sustainable program that would 
continue to fill the needs of our communities. 

The Cornell team’s final evaluation survey 
showed marked differences between students’ 
self-assessments of their knowledge, skills, and 
abilities with regard to the learning outcomes 
before and after taking the class, and nearly all 
students indicated that they would recommend 
the class to other graduate students. This assess- 
ment helped the Cornell team to successfully 
propose that the class become a regular offer- 
ing of the Department of Natural Resources. 
Responses concerning the usefulness of certain 
topics also helped the team refine the course 
framework for the 2014 semester. 

The Carlson and Sapp Nelson team from 
Purdue interviewed the electrical and com- 
puter engineering faculty they worked with 
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to assess their perceptions of their work at the 
end of the program. As a result, conversations 
ensued about scaffolding DIL further into the 
curriculum. The Carlson and Sapp Nelson 
team from Purdue also reviewed and analyzed 
the lab notebooks students produced to gain 
a better understanding of how students did 
or did not incorporate what was taught in the 
DIL program into their work. The results of 
this analysis led to a more student-driven ap- 
proach to incorporating DIL into the student 
teams in Engineering Projects in Community 
Service (EPICS). 

The feedback that the Bracke and Fosmire 
team from Purdue received helped them to 
determine what the agricultural and biologi- 
cal engineering students had learned from the 
experience and to define areas for further ex- 
ploration. They presented their work in DIL 
to the associate dean of research in the Col- 
lege of Agriculture and received her support 
for additional offerings. In the spring of 2014, 
librarians at Purdue taught a semester-long pi- 
lot program in data management and curation 
for graduate students in the College of Agricul- 
ture. The program was structured in ways that 
allowed for student interests and needs to drive 
the content to be covered. In addition to being 
responsive to student needs, adopting a flexible 
structure will allow librarians to better under- 
stand questions on scope, pace, and delivery of 
the material. 

Evaluative assessment feedback collected 
from students at the end of the online course 
helped the University of Minnesota team 
make some adjustments to improve the civil 
engineering students’ experience with the on- 
line modules they developed. Students were 
so new to the concepts of data management 
that even an introductory video was confusing 
to them. Delivering instruction online lim- 
ited contact between students and instructors, 
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making it hard for students to ask questions 
or discuss difficult concepts. The Minnesota 
team has started offering more in-person 
workshops, which are very popular. Online 
content that was easy to procrastinate com- 
pleting has now been repurposed for a hybrid 
online/in-person approach: video lessons are 
sent to students the day before the hands-on 
workshop in this flipped classroom approach. 
The online class is still available as a stand- 
alone, self-paced tutorial for those that can’t 
come to the workshops. 

After completing the training session, the 
Oregon team surveyed the ecology and land- 
scape architecture faculty and students about 
the session to gauge overall usefulness and in- 
vestigate changes in data management practices 
of students as an outcome. These conversations 
resulted in several opportunities to grow the 
program, including a request to teach more 
advanced topics to another research group, 
a request for a guest course lecture, and new 
collaborations with faculty in the chemistry 
department who want instruction for their re- 
search teams. Finally, the original participating 
faculty are now proponents of DIL instruction 
for incoming graduate students. With all of 
these opportunities, the University of Oregon 
team is especially interested in assessing the 
proper balance between making things very 
specific for a particular group and creating con- 
tent that can be used more broadly. 


Developing a Sustainable DIL Program 


Building a sustainable DIL program in your 
library will require continued investment of 
time and resources. However, the skills and 
knowledge sets needed to teach data manage- 
ment are not unfamiliar to academic librarians, 
particularly those who have been involved with 
information literacy efforts previously. 


We have found that the key components to 
grow your initial efforts into sustainable pro- 
grams include the following: 


e Identifying what worked well with your 
initial efforts 

e Engaging with and obtaining the com- 
mitment of the library (particularly li- 
brary liaisons) to those areas to sustain 
and advance your program 

e Investing in scalable educational tools 
that can be repurposed and easily up- 
dated to meet the needs of a broad user 
audience 


Above all, communication is essential. A 
campus with hundreds or even thousands of 
researchers presents a unique challenge for pro- 
motion and awareness of new services offered 
by libraries. Within academic institutions, we 
commonly face communication silos of colle- 
giate, departmental, research group, and even 
individual, proportions. The successful DIL 
program requires a communications strat- 
egy that brings together the various research 
services offered by the libraries and promotes 
them in a systematic way. 


Identify What Worked 

(and Share Successes!) 

As you evaluate and reflect on your DIL 
program, think about what really captured 
people’s attention and which aspects of your 
training were the most engaging. In some 
cases, there might be an academic depart- 
ment on campus that is already engaged in 
data management topics and will uniquely 
benefit from your DIL training program. A 
good example of success will help build mo- 
mentum for your program, and this will be- 
come a jumping-off point for your campus- 
wide program. Once you have an advocate or 


two, interview them to find out what incen- 
tives worked for them. Did they appreciate 
the training of their graduate students? Did 
they like the integration of the DIL principles 
in their curricula? What impact did your pro- 
gram have on student practices? Then use this 
as your “case study” when talking with other 
departments on campus. Tell the story of what 
worked with your initial cohort of students in 
order to demonstrate how these same prin- 
ciples can be expanded to the new depart- 
ment or discipline. Better yet, see if you can 
get your faculty partners or students that you 
taught to tell the story of your DIL program 
and its impact on others. Your work is likely 
to resonate with other faculty and students if 
it is conveyed by their peers. 


Library Staff Engagement 

To take the DIL program to a new discipline, 
you will likely need the buy-in and commit- 
ment of the library subject liaisons to that de- 
partment. Rather than viewing the librarian 
liaison as the gatekeeper to that discipline, re- 
member that he or she is your biggest resource! 
You cannot teach every student DIL compe- 
tencies and expect to develop specialized sub- 
ject knowledge in each area as well. 

There are several elements to consider when 
engaging with your library’s liaisons. First, in- 
vest in training for library staff on DIL skills. 
Training sessions (perhaps adapting the same 
DIL program that was developed for your stu- 
dents) should result in the library liaison ob- 
taining a better understanding of DIL skills 
and, hopefully, a stronger commitment to the 
sharing of this knowledge. Librarians should be 
familiar with the DIL competencies in order 
for them to fully understand the benefits to 
their user populations. 

Next, empower subject liaisons to lead the 
DIL efforts in their area. Library staff, particu- 
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larly those who are subject librarians, will be 
essential to reaching new user populations as 
information gatherers, instructors, and promo- 
tion experts. Have your library liaisons start 
with interviewing one or two faculty in their 
departments on DIL-related needs using the 
DIL interview protocol or other instruments, 
or by leading a focus group of graduate stu- 
dents. This information-gathering exercise will 
highlight the disciplinary needs for the group 
and empower the liaisons to take control of 
their users needs. 

Finally, remind staff that they don’t have to 
start from scratch. Once their population’s DIL 
needs are better understood, you can work with 
them to evaluate your existing programming 
and adapt the DIL training to meet the needs 
of their disciplines. With their subject expertise 
and your DIL experience, your combined ef- 
forts will enable you to scale a DIL program to 
a variety of disciplines on campus. 


Scalable Delivery Tools 

To most effectively expand your DIL program, 
consider educational delivery that is well docu- 
mented and/or easily captured for reuse. For ex- 
ample, if you are teaching a workshop session, 
consider creating a written script and a detailed 
session outline that includes “stage cues” indi- 
cating any actions on the part of the instructor. 
This documentation, along with your presenta- 
tion slides and other handouts, will allow for 
another individual to replicate and adapt your 
session more easily. For example, the University 
of Minnesota team created an instructor's guide 
to their hybrid course (available at http://z.umn. 
edu/teachdatamgmt) to better allow for other 
library liaisons, at their institution and beyond, 
to adapt and reuse their materials. Alternatively, 
your training might be captured in a digital for- 
mat to scale beyond the in-person format and 
reach a variety of users. Recording a training 
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session using Camtasia Relay, Jing, or a more so- 
phisticated video recording tool will allow you to 
post the video to the Web or share it via e-mail. 
Finally, any training tools that you use should 
follow the same principles of data management 
that you are teaching. Your files should be avail- 
able using open standards to allow for the broad- 
est possible reuse. Consider including open li- 
censes, such as Creative Commons (https:// 
creativecommons.org/licenses/), to indicate how 
others (including non-librarians!) can adapt and 
reuse your effective training materials. 


Aim High 

Establish ambitious long-term goals for your 
program—for example, working with your in- 
stitution’s graduate school to have data man- 
agement modules integrated into the overall 
orientation activities for incoming students to 
ensure that all students are exposed to at least 
the core principles. To do this, identify partners 
who can assist with developing resources or 
who can champion your cause. Locating part- 
ners such as your graduate school or your vice 
president for research and understanding their 
mission, goals, and the venues they work in, as 
well as the training they already provide, can 
help you reach beyond the libraries and truly 
provide institutional-level support for DIL. 


CONCLUSION 


We have shared our work from the DIL project 
with the hope that it will inspire and encourage 
librarians and others to take the next step in de- 
veloping and implementing DIL programs of 
their own. We have presented this information 
in ways that allow for flexibility in adaptation 
and further development, since we know that 
others will continue to innovate beyond what 
we've talked about here. We have also included 


both what we did in our programs and what 
we would recommend based on our experi- 
ences, sharing honestly when things did not go 
as hoped or expected, since we want to enable 
people to learn from our work. Above all, we 
hope that this chapter is useful to you in con- 
sidering how you might go about launching a 
successful DIL program of your own. 

As DIL is still an emerging area, we encour- 
age you to share the DIL instruction work that 
you do with your colleagues and peers. We 
need to develop a community of practice in 
this area and to learn from each other. Only by 
connecting and communicating with other ini- 
tiatives—within our libraries, within our insti- 
tutions, or with the broader community—can 
we continue to grow and build DIL in order 
to help prepare and educate the next genera- 
tion of researchers for their professional careers. 
We look forward to the amazing work that you 
will do to help prepare the next generation by 
teaching them the skills they need for effective 
data management and curation. 


REFERENCES 


Corti, L., Van den Eynden, V., Bishop, L., & Wol- 
lard, M. (2014). Managing and sharing research 
data: A guide to good practice. London: Sage Pub- 
lications Ltd. 

Gogus, A. (2012a). Bloom’s taxonomy of learning 
objectives. In N. M. Seel (Ed.), Encyclopedia of 
the sciences of learning (pp. 469-473) [Springer- 
Link version]. http://dx.doi.org/10.1007/978-1 
-4419-1428-6 

Gogus, A. (2012b). Learning objectives. In N. M. 
Seel (Ed.), Encyclopedia of the sciences of learning 
(pp. 1950-1954) [SpringerLink version]. http:// 
dx.doi.org/10.1007/978-1-4419-1428-6 

Inter-University Consortium for Political and Social 
Research (ICPSR). (2012). Guide to social science 


data preparation and archiving: Best practices 
throughout the data lifecycle (5th ed.). Retrieved 
from http://www.icpsr.umich.edu/files/deposit 
/dataprep.pdf 

Piwowar, H. A., Day, R. S., & Fridsma, D. B. (2007). 
Sharing detailed research data is associated 
with increased citation rate. PLoS ONE, 2(3), 
e308. http://dx.doi.org/10.1371/journal.pone 
.0000308 


Developing DIL Programs CHAPTER 9 221 


Radcliff, C. J., Jensen, M. L., Salem, J. A. Jr, 
Burhanna, K. J., & Gedeon, J. A. (2007). A 
practical guide to information literacy assessment 
for academic librarians. Westport, CT: Libraries 
Unlimited 

Tenopir, C., Birch, B., & Allard, S. (2012). Aca- 
demic libraries and research data services: Current 
practices and plans for the future. Chicago, IL: As- 


sociation of College and Research Libraries. 


228 PART II Moving Forward 


APPENDIX: Links to Useful Resources 


Because links quickly become out of date, weve chosen to provide only a small sampling of the 
wide range of resources that exist, particularly resources that our teams relied most heavily upon. In 
addition to the DIL project website, we recommend the e-Science Portal for New England Librar- 
ians (http://esciencelibrary.umassmed.edu/), particularly the DIL section (http://esciencelibrary 
-umassmed.edu/DIL_Home). 


Resources for Learning About Faculty Needs 


Interview instruments used to discuss data management needs and expectations with faculty col- 
laborators as part of our DIL project: http://dx.doi.org/10.5703/1288284315510 

Career profiles, including description of the profession, key roles and responsibilities, and how 
research data management figures in responsibilities; created as part of Data Management Skills 
Support Initiative (DaMSSI) with a focus on higher education institutions in the UK: http:// 
www.dcc.ac.uk/training/data~management-courses-and-training/career-profiles 

Data curation profiles provide information about data management requirements as articulated by 
the researchers themselves; authors from universities across the United States. Data Curation 
Profiles Directory: http://docs.lib.purdue.edu/dcp/ 


Resources for Learning About Graduate Student Needs 


Interview instruments used to discuss data management needs and expectations with graduate 
students as part of a DIL project: http://dx.doi.org/10.5703/1288284315510 

Although they are more general in scope, the following reports discuss developing and offering 
library services for graduate students: 

Lewis, V., & Moulder, C. (2008). SPEC Kit 308: Graduate student and faculty spaces and 
services. Retrieved from http://publications.arl.org/Graduate-Faculty-Spaces-Services 
-SPEC-Kit-308/ 

Covert-Vail, L., & Collard, C. (2012). New roles for new times: Research library services for 
graduate students. Retrieved from http://www.arl.org/storage/documents/ publications 
/nrnt-grad-roles-20dec12.pdf 


Resources for Exploring Assessment 

General Assessment Resources 

Cornell University Center for Teaching Excellence’s Setting Learning Outcomes (http://cte.cornell 
.edu/teaching-ideas/designing-your-course/settting-learning-outcomes.html) and Assessing Stu- 
dent Learning (http://cte.cornell.edu/teaching-ideas/assessing-student-learning/index.html) are 
good introductions to developing learning outcomes for instruction and to general course and 
program-level assessment. You may have similar guidance at your institution. 

University of Illinois at Urbana-Champaign University Library’s Tips on Writing Learning Out- 
comes (http://www.library.illinois.edu/infolit/learningoutcomes.html) provides a quick guide to 
the definition and creation of learning outcomes for information literacy assessment. 
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Assessment and evaluation site from Purdue University’s Center for Instructional Excellence (http:// 
www.purdue.edu/cie/teachingtips/assessment_evaluation/index.html) differentiates between as- 
sessment and evaluation and provides useful resources and tips. 

This book is an excellent introduction to and reference on general information literacy assessment: 
Radcliff, C. J., Jensen, M. L., Salem, J. A. Jr., Burhanna, K. J., & Gedeon, J. A. (2007). A 
practical guide to information literacy assessment for academic librarians. Westport, CT: Libraries 
Unlimited. 


Classroom Assessment Test Resources 
(Including the 1-minute paper exercise many of the teams used, and many more.) 


The following is a classic resource on assessing student learning in the higher education classroom. 
Angelo, T. A., & Cross, K. P. (1993). Classroom assessment techniques: A handbook for college 
teachers. San Francisco: Jossey-Bass. 

The following article details three approaches to information literacy assessment (fixed-choice 
tests, performance assessments, and rubrics) and their theoretical backgrounds, benefits, and 
drawbacks. 

Oakleaf, M. (2008). Dangers and opportunities: A conceptual map of information literacy 
assessment approaches. portal: Libraries and the Academy, 8(3), 233-253. 

The following online sources provide a quick definition and several generally applicable classroom 
assessment techniques (CATs) for evaluating course outcomes, attitudes, values, self-awareness 
and instruction: 

George Washington University’s Teaching & Learning Collaborative: http://tlc.provost 
.gwu.edu/classroom-assessment-techniques 

Virginia Commonwealth University’s Center for Teaching Excellence: http://www.vcu 
.edu/cte/resources/cat/index.htm 

Iowa State University’s Center for Excellence in Learning and Teaching: http://www.celt 
.iastate.edu/teaching/cat.html 

Field-tested learning assessment guide (FLAG) for science, math, engineering, and tech- 
nology instructors’ Classroom assessment techniques (CATs)—overview provides peer- 
reviewed classroom assessment techniques as well as tips on their use by faculty mem- 
bers in the STEM disciplines: http://www.flaguide.org/cat/cat.php 


Resources for Building Course Content 


The University of Minnesota’s online and hybrid course content is available for reuse and adapta- 
tion. There is also an instructor’s guide that walks through the pacing of the course, plus links to 
handouts and activities used in the in-person session. http://z.umn.edu/teachdatamgmt 

The New England Collaborative Data Management Curriculum (NECDMC) was put together by 
the Lamar Soutter Library at the University of Massachusetts Medical School in collaboration 
with libraries from several other institutions. The curriculum can be used to teach data manage- 
ment best practices to undergraduates, graduate students, and researchers in STEM disciplines. 
http://library.umassmed.edu/necdmc/index 


230 PARTII Moving Forward 


Education Modules developed by DataONE (Data Observation Network for Earth). Education 
modules are CCO—No rights reserved, but DataONE asks that users cite DataONE and ap- 
preciates feedback. http://www.dataone.org/education-modules 

ICPSR (Inter-university Consortium for Political and Social Research) published Guide to Social 
Science Data Preparation and Archiving, which is a thorough introduction to data management 
best practices in the social sciences. http://www.icpsr.umich.edu/icpsrweb/content/deposit 
/guide/index.html 

MANTRA research data management has online training designed by the University of Edinburgh 
for “PhD students and others who are planning a research project using digital data.” http:// 
datalib.edina.ac.uk/mantra/index.html 
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INTRODUCTION 


Chapter 1 provided a description of the DIL 
competencies as they were initially conceived 
(Carlson, Fosmire, Miller, & Sapp Nelson, 
2011). Chapter 3 discussed how the compe- 
tencies were modified and used as a means of 
gathering information for the DIL project. A 
primary objective of the DIL project was to 
create instructional interventions based on 
these competencies and to explore data-related 
educational needs within the lab environment. 
Faculty partners informed this process through 
in-depth interviews and by responding to the 
instruction proposed. 

As we were conducting the DIL project we 
recognized a need for continued development 
of the DIL competencies. Through the inter- 
views, faculty responded to the competencies 
in light of their own experiences of data man- 
agement. For each specific competency faculty 
interviewees were asked, “Are there any skills 
that are not listed in this competency that you 
think should be included?” The responses pro- 
vided guidance for how the DIL competencies 
might be enhanced, altered, or removed alto- 
gether in future versions. 

This chapter explores the faculty-proposed 
changes to the DIL competencies, which are 
listed here in an order that follows an approxi- 
mate relationship to the data life cycle. 


e Discovery and acquisition of data 

e Databases and data formats 

e Data conversion and interoperability 
e Data processing and analysis 

e Data visualization and representation 
e Data management and organization 
e Data quality and documentation 

e Metadata and data description 

e Cultures of practice 


e Ethics and attribution 
e Data curation and reuse 
e Data preservation 


Following the suggested changes and a dis- 
cussion on their implications, this chapter will 
describe future research areas that would en- 
hance understanding of disciplinary practices 
and curriculum design for these competencies. 


DISCOVERY AND 
ACQUISITION OF DATA 


Skills in this competency include the following: 


e Locates and utilizes disciplinary data re- 
positories 

e Evaluates the quality of the data available 
from external sources 

e Not only identifies appropriate external 
data sources, but also imports data and 
converts it when necessary so it can be 
used locally 


Students need critical thinking skills and tech- 
niques to retrieve data from a source external 
to the research laboratory or classroom. Gener- 
ally, interviewees agreed with the content of the 
skills list presented in the interview, with a few 
exceptions. One faculty member focused on 
using critical thinking to evaluate the contents 
of an externally produced data set for quality. 
The faculty member did not describe the ac- 
tual metrics by which an individual evaluates 
data quality. However, the need for metrics was 
implied. 


I think also, the skill to evaluate the quality of 
data. It’s very easy for anyone to publish data 


online and very often when we get it, it’s not 
very useful. So we need to look at this and 
make a decision and say, “Okay. It’s helpful 


for us,” or “It’s not useful.” 


— ELECTRICAL AND COMPUTER ENGINEERING 


FACULTY MEMBER 


Another interviewee agreed that an “ap- 
propriate level of skepticism of outside data 
sources” was important. He explained: 


Know your source; know your quality, partic- 
ularly when we're working with remote sens- 
ing GIS data sets. Just understand that they're 
inaccurate, there’s no way around it. 

— AGRICULTURAL AND BIOLOGICAL 


ENGINEERING FACULTY MEMBER 


This revealed the need for analytical think- 
ing around quality for a specific type of data: 
GIS (geographic information system). And it 
raised the question of whether different data 
types require different metrics of quality and 
whether they already exist within disciplines. If 
so, knowledge of the existence of disciplinary 
measures of data set quality may be an appro- 
priate addition to the discovery and acquisition 
of data competency. 

Faculty also raised a concern about negotiat- 
ing access to externally acquired data sets. 


To evaluate a hypothesis, you need to find 
data. Data sets like this are not going to be 
available, so really what you need to be able to 
do is to understand how to create this data and 
then to figure out who has the ability to create 
this data and then who has the authority to 
allow you access to the data. This is the kind 
of thing that I would be involved in. My stu- 


dents would figure out how to create the data, 
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and then I would be figuring out who has the 
ability to collect it and who has the author- 
ity to give us the data. And then talking with 
. . . [the data producers] about how to gener- 
ate this data, whether they’re open to it, and 
how to generate this data in a way that doesn't 
impact their business and doesn’t expose the 
privacy of anything that they care about. 

— ELECTRICAL AND COMPUTER ENGINEERING 


FACULTY MEMBER 


This faculty member identified that needed 
data may not be publicly accessible. To intuit 
who might create data, to make inquiries into 
the existence of the data, and to negotiate ac- 
cess to the data is a complex access process. 
Extensive knowledge of the literature, the dis- 
cipline, and institutional structures to identify 
those who may be collecting data and an in- 
troduction to basic usage agreement terms may 
be appropriate additions to the discovery and 
acquisition of data competency. 


DATABASES AND DATA FORMATS 


Skills in this competency may include the fol- 
lowing: 


e Understands the concept of relational da- 
tabases and how to query those databases 

e Becomes familiar with standard data for- 
mats and types for the discipline 

e Understands which formats and data 
types are appropriate for different re- 
search questions 


The critique of the databases and data for- 
mats competency included a related skills list 
for this area. The comments focused on deci- 
sion making in the design of databases. 
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The thing that I dont’ really see included here 
is an understanding of some of the implica- 
tions of the different types of databases. . . . 
There are several different database products, 
and within those database products there are 
usually multiple database engines. So, for 
example, in MySQL you have a choice be- 
tween the ISAM engine and an[. . .Jother 
one. But one has higher data integrity and 
is sort of more enterprise ready but requires 
more memory. So the other one is the default 
installed engine for MySQL, because it is a 
lower resource usage. 

— ELECTRICAL AND COMPUTER ENGINEERING 


FACULTY MEMBER 


This reflection implies the need for nu- 
anced understanding of the strengths and 
weaknesses of database products and pro- 
gramming languages. It appears that gradu- 
ate students and faculty researchers use dis- 
ciplinary expertise, technical information, 
and research planning and vision to create 
criteria by which to judge the most appropri- 
ate database products and features that will 
contribute to an efficient, successful research 
project. The “development of criteria for deci- 
sion making” may be an overarching compe- 
tency in DIL, related to the need for critical 
thinking throughout the research life cycle. 
Another issue was the time students have to 
develop these skills. 


Capabilities for statistical analysis are a little 
weak. And there are courses they can take on 
campus for the statistical and the relational 
databases, so maybe it’s something that we 
should be requiring. The problem is that if 
they’re going to do a Master’s thesis, they take 
only seven courses. Two of them have to be 
outside of the department, so I guess . . . we 


could ask to make sure that one of those is 


either a database course or a statistical analysis 
course. 


— CIVIL ENGINEERING FACULTY MEMBER 


“Critical thinking about the development 
of building a database” was also reported as a 
needed enhancement of the DIL competencies. 


For my discipline at least, understanding 
those concepts of how to build a good data- 
base would be important in addition to sim- 
ply knowing how to create tables and query- 
ing. And maybe that’s implied by “concept 
of relational databases,” but to me it wasn’t 
there. 

— ELECTRICAL AND COMPUTER ENGINEERING 


FACULTY MEMBER 


In this case, critical thinking applies to the 
design of the database. This faculty interviewee 
called for the addition of “best practices of da- 
tabase design” to the databases and data formats 
competency. This may result in knowledge of 
the most appropriate, efficient ways to program 
a database. 

Enhanced decision making and critical 
thinking skills were a necessary addition when 
choosing appropriate file formats for a given 
research project. 


I would add to this the skill of understand- 
ing the advantages of different types of for- 
mats of files. This issue of knowing text files 
are human readable but not necessarily com- 
puter readable; XML files on the other hand 
are computer readable but bloated and inef- 
ficient; binary files are [at risk for having] 
insufficient documentation of their format, 
but are generally most efficient. If you're go- 
ing to work with text files, you have choices. 
You can do delimiting between fields; you can 


have things in front that tell you how long 


fields are. Students must understand the trad- 
eoffs in using files in these ways and how it 
makes them easier or harder to work with. 

—ELECTRICAL AND COMPUTER ENGINEERING 


FACULTY MEMBER 


Again, the necessity of picking the best 
tool for the job—this time for choosing file 
formats—is an important addition. This fac- 
ulty interviewee considered critically analyzing 
strengths and weaknesses of available formats 
and understanding and predicting the con- 
sequences of the choice of file format for the 
long-term management of the research proj- 
ect to be a foundational skill. This choice rep- 
resents a key decision in the research process 
that can have impact throughout the research 
life cycle. Helping students to identify those 
key decisions and make wise choices for their 
research can be addressed through the DIL 
competencies by including “the development 
of standard operating procedures or decision 
matrixes.” 


DATA CONVERSION AND 
INTEROPERABILITY 


Skills in this competency include the following: 


e Is proficient in migrating data from one 
format to another. 

e Understands the risks and potential loss 
or corruption of information caused by 
changing data formats. 

e Understands the benefits of making data 
available in standard formats to facilitate 
downstream use. 


For the faculty interviewees, this was an 
area that was crucial but not as explicit as they 
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would have liked. Faculty interviewees fo- 
cused on the regular replacement of versions 
of software that leads to problems for future 
use of data. 


[Students need an] understanding that for- 
mats like Microsoft Word .doc files are spe- 
cific and proprietary to Microsoft and that 
there is a need to store those in some format 
which you can be certain that you can open 
again later. The problem with .doc files is that 
the only reliable way to open them is to use 
the version of Word that created them. If that 
version of Word becomes outdated or runs 
on machines that are too old, then you never 
know what youre going to get. 

— ELECTRICAL AND COMPUTER ENGINEERING 


FACULTY MEMBER 


The concepts of “format obsolescence” and 
“changes to software over time” need to be ad- 
dressed in the DIL competency skills list. 

There was concern that students do not 
think critically about the impact data conver- 
sion has on the contents of the data. While the 
competency did address this, faculty interview- 
ees specifically identified student data as poten- 
tially problematic because students tended to 
made conversions without fully understanding 
the ramifications. One interviewee went so far 
as to say: 


I don’t know that students are aware that they 
tend to have more faith in their data than I do. 
—AGRICULTURAL AND BIOLOGICAL 


ENGINEERING FACULTY MEMBER 


The revelation that data conversion may call 
into question the quality of data ties to the need 
for students to “think critically throughout the 
data management process” and “recognize the 
implications of their decisions.” 
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DATA PROCESSING AND ANALYSIS 


Skills in this competency include the following: 


e Is familiar with the basic data processing 
and analysis tools and techniques of the 
discipline or research area 

e Understands the effect that these tools 
may have on the data 

e Uses appropriate workflow management 
tools to automate repetitive analysis of 
data 


The DIL competencies as originally pro- 
posed did not mention programming explicitly. 
In the eyes of faculty interviewees, this was an 
area that needed to be included. 


[I would add]. . 


that I mentioned about using programming 


. these quantitative tools 


languages and knowing how to automate. 
Scripts in R, for instance. We're doing a lot of 
that in my lab now. 

—EcoLoGy/LANDSCAPE ARCHITECTURE 


FACULTY MEMBER 


For the faculty interviewees, the success of 
a student relies on efficient and proficient use 
of scripting and programming to process data. 
One of the faculty members we interviewed 
highlighted one student who was an excellent 
data manager because of excellent program- 
ming abilities, which allowed him to process 
data quickly and efficiently through the use of 
scripting. 

A major aspect of being an excellent pro- 
grammer is the ability to learn new languages 
and techniques for the processing of data and 
then implement those techniques in an appro- 
priate way. 


So the other piece is learning how to learn 


new techniques, right? How to go to the lit- 


erature, how to go to the web, how to pick up 
and teach yourself new tools. 
—EcoLoGy/LANDSCAPE ARCHITECTURE 


FACULTY MEMBER 


This “lifelong learning ability” (to under- 
stand and add tools to a personal research rep- 
ertoire) facilitates the graduate student's ability 
to manage research data. The need for lifelong 
learning skills is imperative for all disciplines in 
scientific research, but it is rarely explicit. The 
need for making this long-term acquisition of 
skills apparent and built into the research expe- 
rience and courses emerged consistently across 
the interviews. 


Certainly most . . . basic use of the tools [is 
learned] in a statistics class or a methodology 
class or something like that. But to me what 
happens is that . . . [students] tend to learn 
fairly basic application in those classes and 
then the transference of learning those tools 
to applying them toward a specific research 
project, critically, are very different skills. And 
they get that mostly in one-on-one mentor- 
ship. . . . I mean any faculty member work- 
ing with a graduate student on their thesis. To 
me, that’s the mentorship. 
—EcOLOGY/LANDSCAPE ARCHITECTURE 


FACULTY MEMBER 


Some faculty expressed concern that analysis 
tools changed the data. A faculty interviewee 
was adamant that students understand that raw 
data should be kept in an unaltered state. 


I mean, we don’t change the data. Once the 
data is there, I don’t want them changing the 
data. . . . This is very important. 


— NATURAL RESOURCES FACULTY MEMBER 


The researchers alter and analyze data sets, 
but the raw data should be preserved. “Keeping 


the raw data” should be added to this compe- 
tency to underpin conversion and interoper- 


ability. 


DATA VISUALIZATION 
AND REPRESENTATION 


Skills in this competency include the following: 


e Proficiently uses basic visualization tools 
of discipline 

e Avoids misleading or ambiguous repre- 
sentations when presenting data in tables, 
charts, diagrams, and so forth 

e Chooses the appropriate type of visualiza- 
tion, such as maps, graphs, animations, 
or videos, based on an understanding of 
the reason/purpose for visualizing or dis- 
playing data 


Data visualization and representation re- 
ceived the most feedback by far. Faculty agreed 
that this was a fundamental competency for 
which the vast majority of graduate students 
needed to develop advanced skill sets. 


Vd say it’s essential because its communica- 

tion. If we don’t communicate, we havent 

done much in the long run. 
—EcoLoGy/LANDSCAPE ARCHITECTURE 


FACULTY MEMBER 


‘The suggestions fell into a broad spectrum 
of interests and concerns. A frequent refrain 
was the need to identify data that tell a story. 


“Avoids misleading or ambiguous representa- 
tions when presenting data,” I'd also put in 
there saying what data not to show. 

—ELECTRICAL AND COMPUTER ENGINEERING 


FACULTY MEMBER 
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I would say the thing that I don’t see here 
that’s most important is being able to evalu- 
ate, “Does this graph show what I expected it 
to show?” 

— ELECTRICAL AND COMPUTER ENGINEERING 


FACULTY MEMBER 


For a graduate student to gain this compe- 
tency, he or she must have a clear understand- 
ing of what to communicate, and then, what 
the visualization is actually communicating. 
Critical analysis identifies which data fields 
heighten understanding or increase explication 
of the findings. This necessitates a higher level 
of understanding about the content of the data 
set and the research project as a whole. 

Faculty interviewees took that need for criti- 
cal thinking further, to address the use of data 
visualizations to make conclusions that are valid. 


I think some of these ideas are introduced in 
courses and probably they see it in practice. I 
think it is something—at a basic level—used 
within the discipline. But I don’t think that 
the part of understanding how these [visual- 
izations] can be used to support the decision 
making process [is present]. And that may be 
a skill—you know—of connecting it to that. 
So if they understand the reason or purpose 
for visualizing, then [they can] utilize . . . [vi- 
sualizations] in support of making decisions. 
— ELECTRICAL AND COMPUTER ENGINEERING 


FACULTY MEMBER 


The need for informed decision making 
based upon visualization and representation 
again ties to the need for critical thinking across 
the data life cycle. In this case, critical thinking 
extends to asking questions of the data for the 
purpose of understanding it and making in- 
formed decisions. 

Finally, the concept of students “learning 
multiple representation tools and choosing the 
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most appropriate tool for the story they wish to 
tell” was a necessary addition to the DIL com- 


petency. 


I wish my graduate students had much more 
background in a broader array of representa- 
tional tools[,] . . . [like] the students I teach 
in landscape architecture. We teach them 
those skills explicitly in our coursework; we 
spend huge amounts of time. We teach them 
representation. We have courses where they 
learn that it’s not just a media skill, but... 
[they] are representing information and data, 
and how to do that compellingly. The science 
students get very little of that. 
—Eco.LoGy/LANDSCAPE ARCHITECTURE 


FACULTY MEMBER 


The idea that successful future scientists need 
new media skills is an area in which there is a 
need for additional research before proposing 
revisions to the competencies. The DIL project 
has just scratched the surface of the needs in 
visualization. 


DATA MANAGEMENT 
AND ORGANIZATION 


Skills in this competency include the following: 


e Understands the life cycle of data, devel- 
ops data management plans, and keeps 
track of the relation of subsets or pro- 
cessed data to the original data sets 

e Creates standard operating procedures 
for data management and documenta- 
tion 


This competency was generally supported 
by the faculty interviewees. 


I'd say part of the standard operating proce- 
dures are to have very high levels of annota- 
tion, whether it be in your programming files 
or in your data sets themselves about what the 
data is, what the units are, when it was col- 
lected, where there might be errors. I mean 
I keep extensive records of my own notes so 
that at the end, I look back and I’ve crossed 
off every single thing I’ve needed to do for 
every single line of data. I don’t think most 
students are that thorough. 
—EcoLoGy/LANDSCAPE ARCHITECTURE 


FACULTY MEMBER 


However, establishing best practices and us- 
ing them presented a difficulty arising from the 
lack of consensus regarding what the best prac- 
tices entail. 


When you say “utilizes best practices and 
understands the importance of frequently 
updating their understanding of what best 
practices are,” in order to do that, one has 
to have readily available sources that tell you 
what they are. 

—EcOLOGY/LANDSCAPE ARCHITECTURE 


FACULTY MEMBER 


Standard operating procedures vary across 
disciplinary practice and research methodol- 
ogy. Designing and consistently using a stan- 
dard operating procedure requires in-depth 
knowledge of how the different types of equip- 
ment and techniques used within the labora- 
tory impact the collection of data. This systems 
thinking may be intuitive to the faculty inter- 
viewees. However, it is unclear how graduate 
students design research methodologies that 
may be based on an entire laboratory of meth- 
odologies and equipment without documenta- 
tion about those environments. This need for 
specifying not only what skills the graduate 


students need, but also at what point in their 
research careers they are likely to learn these 
skills, points to a larger issue—namely that not 
all data competencies may be needed or learned 
during the graduate research phase of a scien- 
tific researcher’s career. 


DATA QUALITY AND DOCUMENTATION 


Skills in this competency include the following: 


e Recognizes, documents, and resolves any 
apparent artifacts, incompletion, or cor- 
ruption of data 

e Utilizes metadata to facilitate an un- 
derstanding of potential problems with 
data sets 

e Documents data sufficiently to enable re- 
production of research results and data by 
others 

e ‘Tracks data provenance and clearly delin- 
eates and denotes versions of a data set 


Few faculty interviewees wanted to augment 
or change this area. In one case, however, the 
faculty member sought to clarify a type of doc- 
umentation that he felt was important but was 
not referenced in the DIL competencies. The 
need for a “story” of the changes that a data set 
goes through was the primary concern. 


Interviewer: So youre saying that the 
amount of documentation and description is 
good for your purposes—you can get a sense 
of what theyre doing—but it wouldnt be 
enough for someone else outside of your lab 
to make sense of it. What is the gap there? 
What would be needed for somebody else to 
understand? 

Faculty: I think in my lab we have a cu- 


mulative knowledge. So from the beginning, 
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we know . . . for example, that we have a 
research proposal. So we know the basic 
idea of what we want to do. And then we 
do some experiments and then the next ex- 
periment, we write the difference from the 
previous one. Then, so if you accumulate 
knowledge, then you understand the differ- 
ence. Then you look at something and say, 
“Okay, I understand where it comes from.” 
But for somebody else, just by looking at the 
difference, it does not make enough sense. 
... To understand what this means. Unless 
you understand the history. 

—EXCHANGE WITH ELECTRICAL AND 


COMPUTER ENGINEERING FACULTY MEMBER 


A major concern of professors was that in- 
dividuals outside their laboratory may mis- 
understand, misrepresent, or misuse their 
data, if shared. The importance of providing 
not simply the context for an individual data 
set but also the context for the data set in 
relation to the entire project is a nuanced 
change that needs to be included in the com- 
petencies. 

Another nuance regarding data quality and 
documentation is the use of externally written 
documentation (such as documentation for a 
software programming language) when creat- 
ing new data products. Particularly with the 
reuse of software code, faculty found that the 
successful use of outside documentation is a 
necessary skill for students. The use of outside 
documentation reflects the need to establish 
the context of the production of a new data 
object, regardless of the origin of those con- 
text documents. 


Maybe include outside documentation. This 
seems to imply that it’s all about organizing 
the data itself, putting things in the data. But 


I’ve found oftentimes outside documentation 
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is actually . . . more helpful than just looking 
at code. 
— ELECTRICAL AND COMPUTER ENGINEERING 


FACULTY MEMBER 


“Knowledge of tools to assist with the cre- 
ation of documentation” was brought up as a 
necessary addition, particularly in the context 
of software developers. For specific software 
programming languages, there are software 
tools that collate and/or create documentation 
from the software as it is written. 


METADATA AND DATA DESCRIPTION 


Skills in this competency include the following: 


e Understands the rationale for metadata 
and proficiently annotates and describes 
data so it can be understood and used by 
self and others 

e Develops the ability to read and inter- 
pret metadata from external disciplinary 
sources 

e Understands the structure and purpose of 
ontologies in facilitating better sharing of 
data 


Metadata and data description were gener- 
ally accepted as necessary, but not very well un- 
derstood, aspects of data management by the 
faculty interviewees. While the competencies 
as written were generally held to be accurate 
by the faculty, one interviewee felt that an even 
more basic need existed. 


Almost maybe even a basic level of understand- 
ing the rationale for metadata, but even just 
.. .a basic understanding, basic knowledge of 
[the concept of] metadata. And examples. 

— ELECTRICAL AND COMPUTER ENGINEERING 


FACULTY MEMBER 


The authors of the competencies assumed 
that there was a need for all graduate students 
to understand that such a thing as metadata ex- 
ists, that it provides some basic level of function 
to a data set, and that it can be useful during re- 
search projects. However, this assumption may 
have been unrealistic. 


In the GIS world, youre at the mercy of other 
peoples data. You're the beneficiary and at 
their mercy. I dont know how many students 
go into metadata. I mean certainly when I was 
learning it until probably like four or five years 
ago, I didn’t go into the metadata that much. 
But I now use it as the source to describe what I 
think about this data, what are the caveats to it. 

—ECOLOGY/LANDSCAPE ARCHITECTURE 


FACULTY MEMBER 


This concept may be outside of graduate 
students’ previous experiences with data man- 
agement. A missing step might be to explain 
what metadata is and why it is useful. 


CULTURES OF PRACTICE 


Skills in this competency include the following: 


e Recognizes the practices, values, and 
norms of his or her chosen field, disci- 
pline, or subdiscipline as they relate to 
managing, sharing, curating, and preserv- 
ing data 

e Recognizes relevant data standards of 
his or her field (e.g., metadata, quality, 
formatting) and understands how these 
standards are applied 


There were mixed responses to this compe- 
tency. No one rejected or augmented any of the 
skills listed. However, many of the respondents 
focused on the idea that cultures of practice 


remained unformed within their discipline. 
Even so, some thought that it was important. 


This is really important, and I think that it’s 
such a changing target right now. I think it’s 
the journal requirements and the funding re- 
quirements that are making it important and 
making it essential. . . . Theyre absolutely 
right to do so. 

—EcoLoGy/LANDSCAPE ARCHITECTURE 


FACULTY MEMBER 


But respondents were unclear as to what 
comprised cultures of practice. The ecology/ 
landscape architecture faculty member went 
on to say: “But it’s something that most of us 
are ill-prepared for. We're just sort of like, ‘Oh, 
okay. What do we do?’ And we ourselves have 
had very little training in this.” 

On the other hand, one faculty member 
described this as not being critical for the stu- 
dents to do their work. 


It’s probably . . . you know, they can do their 
work without understanding this. It’s not es- 
sential that they have this. It’s best if they do, 
but they dont. I convey it to them just simply 
through our discussions of what were doing, 
why we're doing it, and so on. I guess I could 
be doing more, but we don’t talk about all of 
these functions. I mean we talk about some of 
them, but not all of them. 


— CIVIL ENGINEERING FACULTY MEMBER 


Given this lack of clarity with a simultane- 
ous indication of importance, this competency 
needs to be investigated further. It is unclear 
that the definition of cultures of practice pro- 
posed is reflective of current scientific research 
practice. 


I don’t even know if there are practices, val- 


ues, and norms. I would love . . . guidance. 
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So the question is, do our graduate students 
know these things? I mean, Pm between “I 
dont know” to “I guess,” because we've sort 
of ignored it... . So Pm not sure which is the 
right way to frame that. But no, we're clueless 
about this. How’s that? 


— NATURAL RESOURCES FACULTY MEMBER 


ETHICS AND ATTRIBUTION 


Skills in this competency include the following: 


e Develops an understanding of intellec- 
tual property, privacy and confidential- 
ity issues, and the ethos of the discipline 
when it comes to sharing and administer- 
ing data 

e Acknowledges data from external sources 
appropriately 

e Avoids misleading or ambiguous repre- 
sentations when presenting data 


The ethics and attribution competency briefly 
mentions intellectual property. This emerged as 
a problem. Given the complex nature of intel- 
lectual property in the research software field, 
a faculty interviewee spelled out what a gradu- 
ate student who creates software should know 
about intellectual property. 


Interviewer: What do you think your grad- 
uate students should know regarding intellec- 
tual property and these sorts of issues? 

Faculty: There are two answers to that. My 
first answer is that he shouldn't worry about 
it. As a Ph.D. student, he should focus on do- 
ing the research and publishing that research 
and graduating. Now, clearly that answer is 
not complete, because it ignores all of the 
problems that come with ignoring intellec- 
tual property and it got us to where we are 


today. But that is probably what’s best for him 
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in the short-term. Second, what is best for 
the lawyers especially who will have to help 
us deal with it if we ever had to deal with it, 
is for him to understand that when he’s work- 
ing on something he needs to be cognizant 
of whose resources is he using, who is paying 
for his time, and who currently owns what 
he is doing. Right, so he should be aware of 
Purdue's policies on work that he’s doing and 
who owns the work that he’s doing. 

— ELECTRICAL AND COMPUTER ENGINEERING 


FACULTY MEMBER 


The faculty interviewee was ambivalent 
about the utility of this type of knowledge to 
a student while explicitly listing this as a skill 
that the student needed to successfully man- 
age data. There was an acknowledgment that 
lack of knowledge of intellectual property is- 
sues can lead to problems of data management. 
However, this took time that in the professor's 
viewpoint needed to be spent on primary re- 
search and the development of a dissertation 
project. This ambivalence is representative of 
how faculty members felt about a number of 
DIL competencies. They listed many as very 
important or essential while simultaneously 
agonizing about how little time was available 
to teach students the competencies while meet- 
ing research deadlines. 

The same faculty interviewee clarified the 
reference to patents in the ethics and attribu- 
tion list. 


They should probably be taught the pros and 
cons of patents on hardware and software and 
inventions and what that means. And given 
the concept of what it means to invent some- 
thing. They should understand something 
about this issue of “first to invent” versus 
“first to file” and therefore the importance 


of documenting everything that you think 


of. Although, did the system change? It used 
be first to invent, and I think it may have 
switched to first to file? 

—ELECTRICAL AND COMPUTER ENGINEERING 


FACULTY MEMBER 


It is clear that the proposed competency is 
important but that more detail would reflect a 
nuanced understanding of the needs of gradu- 
ate students as they transition to researchers. 
Disciplinary researchers are sometimes unclear 
themselves on the terms and conditions under 
which it is important to file for patents. This 
is a clear opportunity in which libraries may 
contribute to the DIL of graduate students. 
Patent librarians and copyright librarians both 
have expertise to teach developing researchers 
in these areas. 


DATA CURATION 
AND REUSE 


Skills in this competency include the following: 


e Recognizes that data may have value be- 
yond the original purpose, to validate re- 
search, or for use by others 

e Is able to distinguish which elements of a 
data set are likely to have future value for 
self and for others 

e Understands that curating data is a 
complex, often costly endeavor that is 
nonetheless vital to community-driven 
e-research 

e Recognizes that data must be prepared 
for its eventual curation at its creation 
and throughout its life cycle 

e Articulates the planning and activities 
needed to enable data curation, both gen- 
erally and within his or her local practice 


e Understands how to cite data as well as 
how to make his or her data citable 


Interviewees commented that they were 
satisfied with the list as it was given; however, 
faculty might perceive this topic to be outside 
of their domain. One faculty member com- 
mented: “So, what is data curation?” This 
needs to be explored with a broader group of 
disciplinary faculty members. 


DATA PRESERVATION 


Skills in this competency include the following: 


e Recognizes the benefits and costs of data 
preservation 

e Understands the technology, resources, 
and organizational components of pre- 
serving data 

e Utilizes best practices in preparing data 
for its eventual preservation during its ac- 
tive life cycle 

e Articulates the potential long-term value 
of his or her data for self or others and is 
able to determine an appropriate preser- 
vation time frame 

e Understands the need to develop preser- 
vation policies and is able to identify the 
core elements of such policies 


Interviewees rarely augmented the topic 
of data preservation. Faculty were less experi- 
enced with it. The few critiques elucidate pos- 
sible ways of describing the competency that 
researchers may respond to more readily. 


The only thing I'd add to this, when you say 
“utilizes best practices and understands the 
importance of frequently updating their un- 


> 


derstanding of what best practices are... 
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well, in order to do that, one has to have 
readily available sources that tell you what 
they are. 

—EcoLoGy/LANDSCAPE ARCHITECTURE 


FACULTY MEMBER 


Another faculty member focused in on the 
long-term, local reuse of the data by 


making sure that any data that you care about 
is accessible, is replicated, and is in a format 
that you can still read. 

— ELECTRICAL AND COMPUTER ENGINEERING 


FACULTY MEMBER 


This response covered several DIL compe- 
tencies. However, it shows the crucial intercon- 
nectedness of data curation, reuse, and preser- 
vation in the mind of this faculty member. The 
roughly linear format in which we presented the 
competencies did not show their actual roles 
and interplays. Presenting them in a format that 
shows their interconnectedness may encourage 
researchers to perceive them differently. 


FURTHER DEVELOPING 
THE DATA INFORMATION 
LITERACY COMPETENCIES 


The need for critical thinking as a necessary pre- 
cursor to decision making about research proj- 
ects and for the design of new research projects 
emerged as a strong theme. Critical thinking 
is a fundamental trait of an information liter- 
ate individual (ACRL, 2014). The heavy focus 
by the faculty interviewees on this higher or- 
der thinking ability implies a need that is not 
present consistently among graduate students. 
‘There is a need for studies on how to instill crit- 
ical thinking around data management. This 
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would provide welcome insight into a crucial 
facet of data management that builds a well- 
rounded scientist, as well as an information lit- 
erate individual. 

Further investigation is needed into whether 
critical thinking about data management is nec- 
essary in disciplines outside of science, technol- 
ogy, engineering, and mathematics (STEM), or 
if this emphasis is a manifestation of the scientific 
method that underpins research in the STEM 
disciplines. Extending DIL to include social sci- 
entists and humanities researchers would eluci- 
date whether the need for critical thinking skills 
with regard to data is truly universal. 

The primary addition identified for specific 
competencies was that of visualization skills. A 
variety of questions arose as fruitful areas of fu- 
ture study: 


e Do all graduate students in the sciences 
need data visualization skills, or only stu- 
dents in selected disciplines? 

e Are there visualization skills or tools that 
are most appropriate in specific disci- 
plines? 

e Do scientific disciplines now prize visu- 
alization to the point that credit courses 
in visualization are logical additions to 
scientific graduate curricula? 

e What role can visualization training play 
in creating a successful scientist in the 
long term? 


‘These are areas of investigation that could have 
long-term impact on professional success for 
scientists. 

The developers of the DIL competencies ex- 
plored the broad range of competencies STEM 
researchers need to be successful in working 
with data. There is a need for investigation re- 
garding how these competencies may be strate- 
gically embedded across higher education, from 
undergraduate programs, into graduate school, 


and even into postdoctoral programs. Identify- 
ing those skills that are appropriate at all stages 
of the developing researcher’s career would help 
in planning to introduce skills “just in time” 
and in personally meaningful ways to students. 

The next step for this research is curriculum 
mapping: to identify, within an undergradu- 
ate or a graduate curriculum, the courses that 
are logical places to introduce basic concepts 
of data management into the curriculum 
(Harden, 2001). The DIL competencies pre- 
sume a basic understanding of data manage- 
ment concepts. However, the current curricula 
in most disciplines do not introduce these basic 
concepts systematically and progressively. Us- 
ing educational techniques such as scaffold- 
ing, which would incorporate elements of data 
management bit by bit (Dennen, 2004), from 
the beginning of undergraduate curricula could 
help graduate students have stronger prepara- 
tion for their research responsibilities. 

The limitation of the research was the nar- 
row scope of interviews and disciplines asked to 
respond to the competencies. Intended to be an 
initial foray into faculty reactions to the compe- 
tencies and a few ways to teach them to gradu- 
ate students, the project included only a small 
number of STEM disciplines. A necessary next 
step will be to get feedback from many more 
faculty on their perception of the relevance, 
utility, and accuracy of the competency list. 
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INTRODUCTION 


This chapter articulates future directions in 
advancing the practice of data information lit- 
eracy (DIL). Beyond further defining the 12 
DIL competencies, which is the subject of the 
previous chapter, I focus on the development 
of a strong community of practice in this area. 
Here I examine two sources of information in 
determining what these next steps could be: 
the established information literacy commu- 
nity of practice and the emerging commu- 
nity engaged in DIL. Librarians interested in 
furthering DIL could learn a lot from infor- 
mation literacy, particularly in the questions 
and challenges that they have addressed over 
the years. In the first part of this chapter, I 
examine the recently released draft of the As- 
sociation of College and Research Libraries’ 
(ACRL%s) framework for information literacy 
and some of the literature produced by infor- 
mation literacy experts for insight. Next, I turn 
to transcripts from the discussions that took 
place at the DIL Symposium held in 2013 at 
Purdue University. The symposium was at- 
tended by more than 80 librarians, holding 
positions mostly in data services or informa- 
tion literacy, to explore roles, responsibilities, 
and approaches for librarians in teaching data 
competencies. Many insights for future direc- 
tions came out of the symposium that could 
provide an agenda for growth. 


EXPLORING DATA INFORMATION 
LITERACY THROUGH THE LENS 
OF INFORMATION LITERACY 


One of the central strategies of the DIL proj- 
ect was to leverage the investments made by 
the library community in understanding and 


responding to information literacy. The DIL 
case studies illustrated how we informed our 
work through the lens of information literacy. 
However, there are many additional avenues 
for exploring potential linkages between infor- 
mation literacy and DIL. 

This is an interesting time to examine how 
information literacy might inform and propel 
DIL forward as information literacy itself is 
undergoing a transition. In the year 2000, the 
Association of College and Research Libraries 
(ACRL) released Information Literacy Compe- 
tency Standards for Higher Education, which has 
largely defined how information literacy has 
been understood and practiced in academic li- 
braries in the 21st century (Bell, 2013). In 2011 
ACRL launched a task force to review the stan- 
dards to explore whether a revision was needed 
to better reflect current thinking on informa- 
tion literacy. The changes recommended by the 
task force included broadening the definition to 
include other types of literacies and creating a 
framework to connect these literacies, acknowl- 
edging affective and emotion-based learning 
outcomes rather than focusing exclusively on 
cognitive outcomes, and recognizing students 
as content creators and curators (ACRL Infor- 
mation Literacy Competency Standards Review 
Task Force, 2012). ACRL formed the Informa- 
tion Literacy Competency Standards for Higher 
Education Taskforce (http://www.ala.org/acrl 
/aboutacrl/directoryofleadership/taskforces 
/acr-tfilcshe) and charged them with updating 
these standards. This taskforce has released mul- 
tiple drafts over the course of 2014. The new 
framework for information literacy is still in a 
period of review as of this writing. The quotes 
and observations made in this chapter are based 
on the June 2014 iteration and may not be re- 
flective of the final document (http://acrl.ala 
.org/ilstandards/wp-content/uploads/2014/02 
/Framework-for-IL-for-HE-Draft-2.pdf). 


A major shift in the “Framework for Infor- 
mation Literacy for Higher Education” docu- 
ment is how it approaches information literacy. 
Rather than prescribing a set of expected out- 
comes, the framework focuses on identifying 
and connecting core concepts as well as encour- 
aging flexible implementations. This new frame- 
work for information literacy rests on threshold 
concepts. The June 2014 iteration of the frame- 
work document describes threshold concepts 
as “those ideas in any discipline that are pas- 
sageways or portals to enlarged understanding 
or ways of thinking and practicing within that 
discipline” (ACRL, 2014, p. 1 of Draft 2). From 
this perspective, information literacy becomes 
much more nuanced in implementation rather 
than teaching broadly defined skills to students 
through a one-size-fits-all approach. 

Using informed learning as its foundation, 
the ACRL (2014) framework document de- 


fines information literacy as 


a repertoire of understandings, practices, and 
dispositions focused on flexible engagement 
with the information ecosystem, under- 
pinned by critical self-reflection. The reper- 
toire involves finding, evaluating, interpret- 
ing, managing, and using information to 
answer questions and develop new ones; and 
creating new knowledge through ethical par- 
ticipation in communities of learning, schol- 


arship, and practice. (p. 2 of Draft 2) 


Another approach to information literacy 
that has gained attention is that of informed 
learning. Informed learning, as articulated 
by Bruce (2008), recognizes that “teaching 
and learning must bring about new ways of 
experiencing and using information and en- 
gage students with information practices that 
are relevant to their discipline or profession” 
(pp. viii-ix). A central component of informed 
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learning is looking at not only what people 
learn, but also, how they learn it. 


Data Information Literacy and 
Information Ecosystems 


There are strong alignments between informed 
learning, ACRLs proposed framework for in- 
formation literacy, and the DIL project. The 
DIL project was predicated on our developing 
an understanding of the contexts and environ- 
ments in which the faculty and graduate stu- 
dents worked. This included the environmental 
scans and literature reviews conducted by each 
of the five DIL project teams to identify how 
and to what extent selected fields of study dis- 
cuss issues relating to the 12 DIL competen- 
cies. It included gathering information about 
the structure and operation of the research lab 
in which the data were generated, and how 
the students we intended to teach used data. 
Through engaging in these activities, we con- 
structed a preliminary understanding of the 
“information ecosystem” of our students and 
were able to align our educational programs 
with disciplinary and local cultures of practice. 

However, there are many additional avenues 
for further exploration in understanding the 
information ecosystems as they pertain to stu- 
dents’ work and experiences with research data. 
Our interviews revealed that the educational 
experiences of students on data management 
and curation were often informal, uneven, and 
experiential. Therefore, a student’s information 
ecosystem, as it pertained to data, was likely to 
be ill-defined at best. Our exploration into dis- 
ciplinary and local information ecosystems of 
research data was primarily intended to inform 
the development of our educational programs. 
More research into information ecosystems as a 
foundation for generating, processing, analyz- 
ing, applying and disseminating research—and 


250 PART II Moving Forward 


how these ecosystems are understood and prac- 
ticed from the point of view of students and 
faculty—would help librarians respond effec- 
tively to opportunities and needs. 


Data Information Literacy and 
the Challenge of Context 


Some research on information literacy postu- 
lates that an individual’s approach to informa- 
tion literacy is informed by his or her views of 
teaching, learning, and information literacy 
generally, which are adopted implicitly or ex- 
plicitly in different contexts (Bruce, Edwards, 
& Lupton, 2006). This finding on the impor- 
tance of how learning is experienced and the 
effect of context on the efficacy of information 
literacy has implications for DIL. 

Each of the five DIL teams operated in a 
different context and, as a result, each crafted 
different approaches for planning and imple- 
menting programs. Two case studies operated 
in a classroom setting. The Cornell University 
team created a stand-alone mini-course for 
credit, and the University of Minnesota team 
developed a hybrid program with an initial in- 
person session and then online learning mod- 
ules. The three other case studies took place 
“in context,” either within the laboratory or in 
the field. The Carlson and Sapp Nelson team 
from Purdue University worked on-site as 
embedded librarians in a lab. The Bracke and 
Fosmire team from Purdue offered a series of 
workshops in the lab space of the faculty part- 
ner. The team from the University of Oregon 
offered their program during a regular meet- 
ing of the faculty’s research team. Each team 
assessed the impact that their program had 
on student learning, but larger questions on 
context remain. For instance, to what extent 
did the setting for DIL education programs 
(e.g., classroom, online, lab) have an effect 


on student learning? Will DIL programs have 
a greater impact on student learning if their 
focus is on data that students are responsible 
for themselves, as opposed to data sets external 
from their lab and used in a classroom envi- 
ronment? 

There are additional opportunities for re- 
search on the contextual aspects of data skills 
that would aid our collective understanding 
and action on DIL. First, we need to develop a 
better understanding of students’ relationships 
to the data that they are generating or working 
with. How do they perceive their role as a pro- 
ducer of data, especially given that they typi- 
cally have varying degrees of authority over the 
data that they are working on? Do they view 
data as merely a means to an end (a recognized 
scholarly product such as a journal article or a 
graduate thesis), or do the data hold value for 
them as a unique information resource in its 
own right? 


Data Information Literacy in the Presence 
of Standards and Cultural Norms 


Ethical participation in communities of learn- 
ing and scholarship is a key component of 
ACRUs 
Literacy for Higher Education.” This repre- 


draft “Framework for Information 


sents the importance of cultural connectedness 
in information literacy, an acknowledgment 
that an individual’s perceptions and actions 
as a producer and consumer of information is 
informed by, and in turn informs, the larger 
cultures of practice. This recognition of larger 
connections was inherent to the DIL project 
as well, and we incorporated cultures of practice 
into the DIL competencies so that we could 
both understand connections and impart them 
to a larger community. 

One of the challenges that we encountered 
was a lack of widely accepted standards or 


norms in the disciplines of our faculty partners 
for handling, managing, sharing, and curating 
research data. Many research communities are 
becoming more aware of the need to consider 
research data as an asset that has value outside 
of the lab in which they were generated. This 
recognition may be due to the mandates of 
funding agencies and increasing attention to 
data validity and access by high-impact jour- 
nals. Even when a community has launched 
discussions and is taking action to build knowl- 
edge and resources around making data acces- 
sible, these efforts may not be widely known 
beyond those few individuals or institutions 
taking the initiative. For example, DataONE is 
an initiative to build infrastructure and develop 
practices around sharing data sets about “life on 
earth and the environment that sustains it” that 
has received a great deal of support from the 
National Science Foundation (DataONE, n.d., 
“DataONE vision”). However, the University 
of Oregon team discovered that the ecology 
faculty partner had only a minimal awareness 
of DataONE. The Oregon team took this as an 
opportunity to introduce students and faculty 
in the lab group to DataONE. They used ma- 
terials generated by DataONE to discuss con- 
siderations and requirements for sharing data 
outside of the lab. 

This absence of widely adopted norms and 
practices for data management and curation 
presents both opportunities and challenges 
to developing and teaching DIL programs. 
Librarians can play an important role in con- 
necting researchers to the efforts of communi- 
ties that are addressing these issues. They can 
help these efforts take root through DIL edu- 
cation both on home campuses and through 
the professional associations within disciplines. 
In instances where community efforts have 
yet to catch on, librarians can act as a catalyst 
through education of issues and considerations 
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for research data. Ultimately, it is up to the dis- 
cipline to take ownership and action regarding 
norms and practices surrounding research data. 
As DIL initiates change and spurs action, we 
need a better understanding of how best to fos- 
ter change within communities and how librar- 
ians might be effective agents of change. 


Data Information Literacy and Preparing 
Students for the Workplace 


Many in the library community realize that 
information literacy considerations should ex- 
tend beyond the classroom, into the workplace. 
This is acknowledged, in part, within ACRLs 
new draft “Framework for Information Lit- 
eracy for Higher Education,” which advocates 
for a more contextualized understanding of the 
information ecologies in which students are 
immersed. Embedded within this document 
are statements on preparing students for pro- 
fessional work through developing their ability 
to work in teams and the need to better un- 
derstand the information literacy needs of stu- 
dents enrolled in professional degree programs. 

The drafts of the new information literacy 
framework reflect findings from library science 
research on how and to what extent informa- 
tion literacy is applied in the workplace. A re- 
cent report from Project Information Literacy 
described its findings on how information lit- 
eracy skills are put to use by students who have 
joined the workforce (Head, 2012). Researchers 
sought perspectives from both employers and 
employees regarding the information-seeking 
behaviors of recently hired college graduates. 
Among their findings was the recognition that 
employers valued information literacy profi- 
ciencies in new hires, but that new hires did 
not always apply these skills effectively. New 
hires often defaulted to using information that 
could be found quickly using a search engine 
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rather than using other sources of information 
or demonstrating persistence in seeking infor- 
mation that would address their needs more 
effectively. In addition, new hires formed adap- 
tive strategies for addressing their information 
needs, which were typically trial-and-error. 
Disconnections between information literacy 
as taught in academic settings and the informa- 
tion literacies applied or needed in the work- 
place are found in other studies as well. Weiner 
(2011) noted that the complex, unstructured, 
and open-ended nature of the workplace con- 
trasts with the more prescribed and directed at- 
mosphere of education. Lloyd and Williamson 
(2008) took this observation a step further by 
noting that the generalizations of research done 
in educational environments do not necessarily 
reflect the realities of information needs in the 
workplace. They found that there is a multitude 
of possible workplaces, each with its own set of 
contextualized practices, norms, and expecta- 
tions that make it difficult for information liter- 
acy (as typically defined by librarians) to trans- 
late effectively outside of a text-based research 
environment. Instead of viewing information 
literacy as a set of skills to master, they argued 
that educators must see it as a holistic practice 
that considers environmental context as well as 
the social and physical experiences of the person 
with information. 

Research into information behaviors and 
needs in the workplace continues to be an 
important area for informing information lit- 
eracy theories and programs. Similar explora- 
tions are needed to inform the development 
of DIL, as many students go into jobs outside 
of academia. As companies become more and 
more data driven, new employees need to be 
equipped to work in data-intensive environ- 
ments and excel as responsible data stew- 
ards. We were not able to address this with 
much depth in the DIL project; however, we 


recognized the large impact that the environ- 
ment, expectations, and needs of employers 
will play in shaping educational programming 
surrounding data management and curation. 
For example, the Carlson and Sapp Nelson 
team from Purdue worked with students de- 
veloping software code as a component of 
their participation in the Engineering Projects 
in Community Service (EPICS) program. The 
literature review revealed concerns regarding 
how code was managed and organized within 
software companies. This team spoke with a 
few managers at software firms and heard con- 
cerns about similar issues that arose in their 
needs assessment with the faculty and students 
in the EPICS program: insufficient documen- 
tation, difficulties in handing off code to other 
teams, and quality assurance challenges. Look- 
ing forward, we need to be able to move be- 
yond anecdotes to an objective understanding 
of how to respond to data management and 
curation needs in the workplace. Just as the 
information literacy community has begun to 
investigate the needs of the workplace to in- 
form program development, the DIL commu- 
nity needs to conduct research into the prac- 
tices and needs of the workplace with regard to 
working with data. 


FURTHER DEVELOPING DATA 
INFORMATION LITERACY: 
A COMMUNITY PERSPECTIVE 


The DIL project team held a symposium at 
Purdue University on September 23 and 24, 
2013. The intent of the symposium was to fos- 
ter a community of practice in research libraries 
centered on developing and implementing sus- 
tainable institutional DIL programs. Although 
the symposium included presentations from 


the DIL project teams about the work that they 
had done, the primary focus was on synthesiz- 
ing what we learned. This was so that we could 
provide practical guidance for others to create 
DIL programs as well as articulate potential 
roles and responsibilities for librarians in DIL. 
The symposium included presentations, dis- 
cussions, exercises, and other activities to en- 
gage participants on these topics. The schedule, 
videos, and materials used at the symposium 
are openly available at http://docs.lib.purdue 
.edu/dilsymposium/. 

Throughout the symposium, participants 
were encouraged to consider areas for further 
development in DIL, both within their own in- 
stitution and for a broader community of prac- 
tice. The final session of the symposium was a 
group discussion on this topic. The themes that 
emerged from this discussion are presented 
here. 


Raising Awareness 


The idea that librarians should provide research 
data services is taking root in many academic 
libraries; however, librarians teaching compe- 
tencies for working with research data is a rela- 
tively new development. Teaching DIL skills 
is a natural fit for librarians as information 
literacy is a central component of libraries. It 
is a logical step then to look to what we have 
learned about how librarians have developed 
information literacy programs to inform our 
efforts with data. 

It is important to recognize that information 
literacy was not universally accepted as a role by 
librarians even after the release of the landmark 
ACRI Presidential Committee on Information 
Literacy: Final Report in 1989, which codified 
the term (ACRL, 1989). Questions arose on 
the actual meaning of the term information lit- 
eracy and how it was fundamentally different 


Future Directions for DIL CHAPTER 11 253 


from other roles such as bibliographic instruc- 
tion (Snavely & Cooper, 1997). Others pushed 
back against information literacy, dismissing 
it as a public relations exercise and a social 
problem that librarians invented to solve and 
reclaim relevancy (Foster, 1993). Getting the 
library community to embrace information lit- 
eracy required an investment of time and effort 
on the part of those who saw its potential for 
libraries and for organizations, such as ACRL, 
which fostered dialogue at national and inter- 
national levels. DIL is going through a similar 
gestation period where definitions, roles, and 
responsibilities are being discussed and debated 
in the library community. This will require ad- 
vocates who can speak passionately and articu- 
late paths toward advancing an awareness of 
DIL and how librarians could contribute. 

Raising awareness of DIL will also require 
investment and activism at the local level. Our 
ability to develop DIL programs will depend 
on our ability to present compelling arguments 
to colleagues in libraries and on campus. Craft- 
ing these arguments will be challenging since 
time and resources are issues for academic li- 
braries. Librarians may be reluctant to take on 
this responsibility, especially if it is not an ad- 
ministrative priority for the library. 

Most importantly we must raise awareness 
of DIL among the faculty, students, and ad- 
ministrators at our institutions. We must ar- 
ticulate clear messages that speak to the needs 
of stakeholders with regard to data. A central 
tenant of the DIL project was taking the time 
to know our partners’ environments, prac- 
tices, and challenges in working with data. 
We believe that this investment enabled us to 
forge meaningful connections with the fac- 
ulty and students. Most of the DIL teams are 
continuing to work with their faculty partners 
to refine the programs that they developed 
through this project. 
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Forming Communities of Practice 


As interest and capacity for DIL take root, we 
need to find ways to come together as practi- 
tioners in this emerging field to form a com- 
munity of practice. Communities of practice 
facilitate the communication of information, 
strategies, and experiences, thereby enabling 
members to learn from each other in ways that 
foster professional development. They are im- 
portant for defining common terminologies 
and concepts, forging standards and best prac- 
tices, and identifying potential areas of growth. 

By design, DIL straddles two existing com- 
munities of practice: information literacy and 
data services. Information literacy communities 
are well established, having developed multiple 
communication venues, publications, and other 
support structures within the library profes- 
sion and beyond. ACRI information literacy 
standards have been widely accepted and ad- 
opted. On the other hand, data services is a less 
established field, though there are some profes- 
sional conferences and other venues for discus- 
sion, such as the International Digital Curation 
Conference (http://www.dcc.ac.uk/events/interna 
tional-digital-curation-conference-idcc), [ASSIST 
(http://iassistdata.org/), and the Research Data 
Access and Preservation Summit (http://www.asis 
.org/tdap/). We are also seeing an increasing num- 
ber of publications and initiatives that address data 
services provided by libraries, such as the Journal of 
eScience Librarianship (http://escholarship.umass 
med.edu/jeslib/). 

The community of practice for data librar- 
ians is different from the community support- 
ing information literacy. Although librarians 
comprise a sizable block of the membership 
of professional organizations and attendees at 
conferences, they are joined by information 
technologists, research faculty, data scientists, 
and others whose work centers on managing 


and curating data. Within the larger data com- 
munity there is much discussion regarding 
roles and responsibilities and the knowledge 
and skill sets needed to assume them. Roles in 
supporting data work that have been discussed 
include data creators (researchers), data scien- 
tists, data managers, data librarians, data stew- 
ards, and data publishers (Lyon, 2013; Pryor 
& Donnelly, 2009; Swan & Brown, 2008). 
Although roles and responsibilities are in flux, 
including multiple perspectives in the discus- 
sion encourages the inclusion of a wider range 
of issues and viewpoints. 

A foundational goal for those involved in in- 
formation literacy is to connect with other com- 
munities with complementary interests and aims 
(ACRL, 1989). An example is the 2013 ACRL 
report, which explored strategic alignments be- 
tween information literacy and scholarly com- 
munication, noting that they have multiple 
areas of mutual interest and that opportunities 
exist for collaboration to address these areas 
(ACRL, 2013). This report also included data 
literacy as one of the points of intersection. 

Today we are in the process of defining 
DIL. How communities of practice will form 
around DIL remains to be seen. Will DIL find 
a home as a component of a larger established 
community, such as data services or informa- 
tion literacy, or will it develop its own distinct 
community? Participants at the DIL Sympo- 
sium expressed an interest in creating a means 
of communicating and sharing information 
about resources and developments in DIL with 
one another through discussion lists or other 
channels. We did not want to create an addi- 
tional silo, but rather to grow and sustain con- 
nections with the communities from whom we 
could model and learn. As DIL becomes more 
recognized and accepted as a role for librarians, 
those engaged in DIL activities will have to 
consider what their needs are as a community 


and if satisfying those needs would mandate 
a distinct community of practice, a presence 
within larger communities, or some combina- 
tion of both. 


Developing and Sharing Materials 


A component of forming and maintaining com- 
munities of practice will be developing a means 
to share approaches, methods, and materials 
in ways that those within (and outside of) the 
community can apply them. At the DIL Sym- 
posium, attendees referenced the different types 
of materials they would like to have to support 
their work. They spoke about the power of shar- 
ing real-life “data horror stories” to raise the in- 
terest of faculty and students and motivate them 
to attend educational programming. Several at- 
tendees stated that they would like to have illus- 
trations of how good practices in data manage- 
ment and curation resulted in positive changes 
for researchers, such as an increased impact for 
researchers who made their data sets openly avail- 
able, or specific benefits to a lab. Relevant stories 
have not been easy to find, but this is chang- 
ing. For example, figshare.com is posting success 
stories through social media; Dorothea Salo, a 
faculty associate at the University of Wisconsin— 
Madisons School of Library and Information 
Studies, created a listing of “data horror stories” 
(https://pinboard.in/u:dsalo/t:horrorstories/); 
and DataONE collects and posts real-world 
data issues and challenges (https://notebooks 
.dataone.org/data-stories/). 

Participants in the DIL Symposium men- 
tioned their desire for a clearinghouse of edu- 
cational materials that could be used to gener- 
ate ideas or repurposed for use in a different 
program or environment. We are starting to see 
organizations create educational materials that 
support librarians and others in teaching data 
competencies. The University of Massachusetts 
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Medical Center, with support from the Na- 
tional Library of Medicine and others, has in- 
vested considerable effort in developing data 
literacy curricula and learning modules that 
can be adapted (http://library.umassmed.edu 
/necdmc/index). DataONE has also devel- 
oped education modules that can be aug- 
mented and reused to meet local needs (http:// 
www.dataone.org/education-modules). What 
is missing is a centralized repository for col- 
lecting materials that address a particular need 
in the DIL community, along with narratives 
that would provide the context for how these 
materials were used and the impact they made. 
Although locally created materials may be less 
adaptable than materials created with the spe- 
cific intent of repurposing, they provide insight 
into the development process, the approaches 
taken, and lessons learned. This was a primary 
goal in creating this book: to share the materi- 
als that we developed and our experiences in 
using them. 


Professional Development 


In this evolving environment we are seeing in- 
terest in DIL grow and opportunities for librar- 
ians to take initiative expand. It is important 
for librarians to educate themselves in these 
new skills so that they can take on DIL edu- 
cation in effective ways. However, the lack of 
models and curricula can make it difficult for 
librarians to prepare or respond to the opportu- 
nities on their own. The capacity and capabili- 
ties of librarians and others involved in teach- 
ing DIL or in developing programs will need to 
advance. Therefore, we must explore what pro- 
fessional development opportunities librarians 
need to develop their own competencies in data 
management and curation theories and prac- 
tices, as well as how to best teach these com- 
petencies to students. One possible approach 
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comes from the Society of American Archives 
(SAA). The SAA offers a certification program 
to educate its professional workforce on cu- 
rating born-digital archival materials. Their 
Digital Archives Specialization (DAS) program 
(http://www2.archivists.org/prof-education 
/das) requires participants to complete at least 
9 continuing education courses and pass a 
comprehensive 3-hour examination to receive 
the 5-year renewable certification. 

The DIL competencies were developed with 
an assumption that they would likely extend 
beyond the knowledge of a typical librarian, 
faculty member, or information technology 
(IT) professional. Launching a comprehensive 
DIL program requires multiple experts from a 
variety of units within the institution. One of 
the topics of conversation at the DIL Sympo- 
sium was the need to be able to connect with 
the faculty to understand their needs and to 
convey what the library community has to of- 
fer. Since librarians with subject liaison respon- 
sibilities connect with the faculty in the depart- 
ments they serve, they can be paired with data 
and/or information literacy librarians to de- 
velop and implement DIL programs. However, 
library liaisons may be uncomfortable with or 
unable to take on additional responsibilities in 
an unfamiliar area. Other librarians with spe- 
cialized expertise such as metadata, managing 
digital repositories, or in intellectual property 
can participate in the program. A community 
of practice in the library (and the larger institu- 
tion) will likely be needed. Developing such a 
community that spans the library organization 
would help reduce the barriers to participation 
in DIL programs and help ensure that commu- 
nity members’ knowledge, skills, and connec- 
tions are applied appropriately. 

A critical component of the success of an 
internal community of practice is the support 
received from the library’s administration. 


Carlson (2013) identified lack of organiza- 
tional support as one of the barriers to in- 
creased engagement of librarians in working 
with research data. In addition to securing 
needed approval and resources, library admin- 
istrators have contacts within the university 
administration and with others on campus to 
which other librarians may not have ready ac- 
cess. They may be able to help raise awareness 
about the DIL activities underway in our li- 
braries to larger audiences to help extend our 
reach. An important consideration in develop- 
ing sustainable DIL initiatives is what profes- 
sional development in DIL might mean for a 
library as an organization, in addition to indi- 
vidual librarians. 


Scoping Data Information Literacy 


A set of questions that arose at the DIL Sym- 
posium was about the balance between gen- 
eral best practices in working with data and 
disciplinary standards. Many disciplines do 
not have accepted standards surrounding the 
management, publication, and curation of re- 
search data. This makes it difficult to develop 
DIL programs that align with a student’s pro- 
fessional identity. Some of the DIL teams relied 
on established standards, using them as a foun- 
dation and adapting them to local practices. 
Other teams focused on developing solutions 
based on best practices relating to the DIL 
competencies generally and then tying them to 
existing local practices. Furthermore, some of 
the teams decided to incorporate several of the 
DIL competencies into their programs, while 
others chose to focus on just one or two of 
them. Other factors, such as specific issues and 
learning objectives to be addressed in the pro- 
gram, weighed heavily in the team’s determina- 
tion of the scope of their program. However, 
the driving factor in decisions of scope was the 


amount of student time and access available to 
each of the teams. 

It is not yet clear what, if any, the universal 
competencies for managing, sharing, and cu- 
rating data are and how they could be taught 
to an audience from different research fields. A 
symposium participant suggested that the 12 
DIL competencies could serve as a standard 
in the same way as ACRLs (2000) Informa- 
tion Literacy Competency Standards for Higher 
Education has. There is some appeal to this 
idea as the DIL competencies are meant to 
be widely applicable across multiple fields of 
study. However, as noted in Chapter 10, the 
DIL competencies have not been fully vetted 
beyond the DIL project, so it is premature to 
anoint them as a standard. It is also worth not- 
ing that since Information Literacy Competency 
Standards for Higher Education was published 
in 2000, several discipline-specific information 
literacy standards have been created, including 
standards in science and engineering/technol- 
ogy (ALA/ACRL/STS Task Force, n.d.), an- 
thropology and sociology (ALA/ACRL/ANSS 
Task Force on IL Standards, 2008), and nurs- 
ing (Health Sciences Interest Group, 2013). 

We found that the DIL competencies were 
a useful framework for gathering information 
from faculty and students, for informing the 
DIL programs that we developed, and for fa- 
cilitating conversation and comparisons be- 
tween the five case studies. However, we rec- 
ognize that as more DIL programs take root 
there will likely be a need for librarians and 
others to craft specific or targeted variants of 
the DIL competencies. These variants may be 
based on disciplinary practices and needs, but 
they could also be based on a particular re- 
search method, data type, or context—for ex- 
ample, a set of competencies primarily focus- 
ing on sharing data outside of the lab. There is 
certainly plenty of opportunity for exploration 
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beyond the foundational set of DIL compe- 
tencies that we employed in the DIL project, 
provided that we keep the focal point of DIL 
on addressing the real-world needs of research- 
ers through acquiring a solid understanding of 
their environments. 


Audiences for Data Information 
Literacy Programs 


The DIL Symposium participants raised ques- 
tions about expanding the target audience for 
DIL beyond graduate students in the science, 
technology, engineering, and mathematics 
(STEM) disciplines and expressed interest in 
developing DIL programs for undergraduate 
students. One of the recurring themes from 
the interviews with faculty was the assumption 
that graduate students had already had some 
exposure and experience in working with data 
prior to their coming to work in the lab—an 
assumption that was not always correct. DIL 
programs developed for undergraduate stu- 
dents would prepare them for a data-intensive 
workplace or facilitate their transition to grad- 
uate school, where they may be expected to 
assume responsibilities for developing, manag- 
ing, and working with data sets. A particular 
challenge in developing DIL programs for un- 
dergraduates will be tailoring these programs 
to the undergraduate environment. Unlike 
graduate students, undergraduates do not typi- 
cally have responsibilities that pertain to the 
production of data sets outside of a specialized 
undergraduate research opportunity program 
(UROP). Therefore it may be difficult to con- 
nect them in meaningful ways to the issues 
that arise when working with data. However, 
undergraduates are often consumers of data 
sets, and developing a DIL program from that 
perspective may serve as a useful introduction. 
In addition, many colleges and universities 
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have programs that provide undergraduates 
with opportunities to engage in research proj- 
ects, such as Michigan’s UROP (http://www 
.lsa.umich.edu/urop/) or the National Science 
Foundation’s Sponsored Research Experiences 
for Undergraduates (REU) programs (http:// 
www.nsf.gov/crssprgm/reu/). These programs 
can serve as potential points of entry for DIL. 
We expect the interest in undergraduate educa- 
tion on data topics to increase as more atten- 
tion is given to the value of well-managed data 
sets and the need for an educated workforce to 
steward them. 

There may be other audiences for DIL pro- 
grams beyond students. Faculty may benefit 
from instruction on data management and 
curation, but that would pose multiple chal- 
lenges. As busy as graduate students are, fac- 
ulty are even busier. Faculty are also experts in 
their fields and may require a much different 
approach in instruction than students. Fur- 
thermore, faculty may have developed familiar 
routines, even if they acknowledge that these 
routines are less than ideal. Faculty may be re- 
luctant to commit to changes in working with 
data if learning curves are deemed too high or 
the immediate benefit is not clear and does not 
outweigh the perceived costs of investment. 
Lab or IT staff who are tasked with administer- 
ing and stewarding data sets may be motivated 
to participate in a DIL program. 


CONCLUSION 


The time is ripe to develop the role of librarians 
and other information science professionals in 
delivering DIL programs and to form commu- 
nities of practice to support these endeavors. 
‘The information literacy community can serve 
as a useful point of reference. In addition, the 
intersections between data, information liter- 
acy, and other communities within the library 


field should be recognized and cultivated. Pro- 
viding DIL programming requires the involve- 
ment of individuals with different skill sets and 
perspectives within (and outside of) libraries. 

This chapter identified growth areas for edu- 
cational programming for graduate students in 
working with research data. The response that 
the DIL project has received from faculty, stu- 
dents, administrators, and others at our respec- 
tive institutions has been phenomenal, and we 
expect a high level of interest to continue. The 
DIL project itself ended, but the work that the 
five DIL teams did at four academic institu- 
tions continues to pay dividends as we pursue 
our individual efforts. This is truly an emerging 
area of need and one in which librarians can 
play a significant leadership and teaching role. 
We look forward to seeing DIL and supporting 
communities of practice take root in the com- 
ing years. 
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